Coding4Food LogoCoding4Food
HomeCategoriesBookmarks
vi
Coding4Food LogoCoding4Food
HomeCategoriesBookmarks
Privacy|Terms

© 2026 Coding4Food. Written by devs, for devs.

All news
TechnologyAI & Automation

M5 Max 128GB Put to the Local LLM Test: A Python Venv Nightmare and Raw Benchmarks

March 12, 20263 min read

A Redditor got the M5 Max 128GB and tortured it with massive Local LLMs. See the raw MLX benchmarks, the RAM-hogging stats, and the dev drama behind it.

Share this post:
board, electronics, computer, electrical engineering, current, printed circuit board, data, cpu, circuits, chip, technology, control center, solder joint, riser board, computer science, microprocessor, electronics, computer, computer, technology, technology, technology, technology, technology
Nguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmarkNguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmark
Nguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmarkNguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmark
m5 max benchmarklocalllamaapple silicon chạy aimlx_lmtest llm m5 maxqwen3.5
Share this post:

Bình luận

Related posts

ai generated, ai, microchip, artificial intelligence, robot, technology, digital, computer science, future, digitization, futuristic, network, communication, data, web, cyborg, computer, information, data exchange, robotics, internet, processor
AI & AutomationTechnology

Qwen 3.5 Mini Drops: Christmas Came Early for the Potato GPU Squad

Qwen 3.5 just dropped its small variants, and the benchmarks are insane. Broke devs with potato PCs are celebrating, while big GPU owners are confused.

Mar 32 min read
Read more →
rose, beautiful flowers, bicolored flower, bicolored rose, petals, blossom, rose flower, bloom, flower, flora, floriculture, horticulture, botany, nature, rose petals, plant, flowering plant, single rose, single flower, floribunda, rose bloom, flower background, flower wallpaper, close up
Dev LifeAI & Automation

Getting Roasted by the 'Vibe Coding' Trend: Building AI Apps for an Audience of One

Tech Reddit is melting down over 'Vibe Coding': spending nights building fancy AI apps only to realize you are the sole user. C4F dives into the hilarious drama.

Mar 143 min read
Read more →
ai generated, data centre, computer, server, rack, technology, digital, processor, server, server, server, server, server
IT DramaAI & Automation

r/LocalLLaMA Drama: Wrapper Dev Gets Roasted for Calling Local AI Users 'Broke'

A popular tech YouTuber caught massive heat from the AI community after claiming local LLM enthusiasts are just too broke to pay for API keys.

Mar 112 min read
Read more →
ai generated, woman, geisha, female, face, digital art, artificial intelligence, box beads, headpiece, artistic
IT DramaAI & Automation

AI Drama Alert: Qwen's Mastermind Junyang Lin Ousted Just After Shipping 3.5

Junyang Lin, the genius behind Qwen AI, has mysteriously left the team right after releasing version 3.5. Inside corporate drama or the end of open-source Qwen?

Mar 43 min read
Read more →
conclusion of contract, handshake, trade, business, profit, black money, control, treasury, sale, to buy, selling, commercial, shopping, money case, currency, money, contract, dollar, shaking hands, handshake, profit, selling, selling, selling, selling, selling
IT DramaAI & Automation

The Great OpenAI Pivot: Why devs are crowning him "S(c)am Altman"

OpenAI's shift from non-profit to investor-pleasing machine gets roasted. Is the 'Open' finally dead? Let's dive into the Reddit drama.

Mar 12 min read
Read more →

We've all been hearing the whispers about the Apple M5 Max, but the waiting game is over. A madlad on Reddit going by cryingneko just got their hands on the M5 Max 14-inch with a beefy 128GB of RAM and immediately decided to torture test it with massive Local LLMs. Why bother setting up a cloud vps when you can literally melt your new shiny laptop, right?

The "Hold My Beer" Moment and the Python Venv Curse

OP came in hot, promising raw numbers, no fluff, no 20-minute YouTube video telling you to hit subscribe. Just straight-up benchmarks. But as any dev knows, the universe hates a cocky programmer.

The numbers got delayed. Why? Because OP initially ran the tests using BatchGenerator, and the token generation speeds were absolute garbage. Instead of posting bogus stats, OP did what any sane developer would do: panicked, trashed the setup, spun up a pristine fresh Python virtual environment, and re-ran everything using pure mlx_lm with stream_generate.

Moral of the story: Your $4,000 machine is only as fast as your spaghetti code and the dependencies you blindly pip install.

The RAM-Gobbling Numbers

Once the environment was sorted, OP dropped the logs. Here's what happens when you push the M5 Max to its limits with AI models:

  • Qwen3.5-122B-A10B-4bit: This beast casually chewed through 76.397 GB of peak memory. Prompt processing went brrr at over 1239 tokens/sec, while generation hovered steadily between 54 - 65 tokens/sec.
  • Qwen3-Coder-Next-8bit: Say goodbye to your RAM. This model peaked at 92.605 GB when dealing with a 65k context window. Prompt speeds spiked to 1887 tokens/sec, but generation dropped to 48 - 79 tokens/sec depending on the load.
  • gpt-oss-120b-MXFP4-Q8: The absolute speed demon of the bunch. It processed prompts at an insane 2710 tokens/sec. Generation was smooth at 64 - 87 t/s, and surprisingly, it was gentle on the RAM, peaking at only 65 GB.

The only slight disappointment was the Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit, which crawled at 14 - 23 tokens/sec. OP also wanted to test the Qwen 35B but forgot to download it. Classic.

The Reddit Peanut Gallery Reacts

With over 1.3k upvotes, the post blew up, and the LocalLLaMA Discord went nuts. But while OP was fighting with Python packages, the comment section was doing its thing:

  • The Impatient Ones: User No_Afternoon_4260 brought the sarcasm early: "Been 10 minutes, where are the benchmarks? /S". Another chimed in: "Its already 14min without benchmarks. What is OP even doing".
  • The Copium Inhalers: sammcj was eagerly waiting for the 27B model numbers, crying in the corner because "Mine arrives in two weeks!".

The Senior Dev Takeaway

Beyond seeing the ridiculous capabilities of Apple's Unified Memory architecture (which makes running 100B+ parameter models locally actually viable), there's a vital lesson here.

Always double-check your tooling. OP almost published garbage benchmarks just because BatchGenerator wasn't playing nice. If your numbers look weird, don't blame the silicon immediately—check your packages, your environment, and your code.

The M5 Max is clearly a beast for local AI. If you have the budget, go nuts. As for the rest of us mere mortals, we'll just keep paying for API calls and crying in 16GB RAM.

Source: Reddit - r/LocalLLaMA