A Redditor got the M5 Max 128GB and tortured it with massive Local LLMs. See the raw MLX benchmarks, the RAM-hogging stats, and the dev drama behind it.

We've all been hearing the whispers about the Apple M5 Max, but the waiting game is over. A madlad on Reddit going by cryingneko just got their hands on the M5 Max 14-inch with a beefy 128GB of RAM and immediately decided to torture test it with massive Local LLMs. Why bother setting up a cloud vps when you can literally melt your new shiny laptop, right?
OP came in hot, promising raw numbers, no fluff, no 20-minute YouTube video telling you to hit subscribe. Just straight-up benchmarks. But as any dev knows, the universe hates a cocky programmer.
The numbers got delayed. Why? Because OP initially ran the tests using BatchGenerator, and the token generation speeds were absolute garbage. Instead of posting bogus stats, OP did what any sane developer would do: panicked, trashed the setup, spun up a pristine fresh Python virtual environment, and re-ran everything using pure mlx_lm with stream_generate.
Moral of the story: Your $4,000 machine is only as fast as your spaghetti code and the dependencies you blindly pip install.
Once the environment was sorted, OP dropped the logs. Here's what happens when you push the M5 Max to its limits with AI models:
76.397 GB of peak memory. Prompt processing went brrr at over 1239 tokens/sec, while generation hovered steadily between 54 - 65 tokens/sec.92.605 GB when dealing with a 65k context window. Prompt speeds spiked to 1887 tokens/sec, but generation dropped to 48 - 79 tokens/sec depending on the load.2710 tokens/sec. Generation was smooth at 64 - 87 t/s, and surprisingly, it was gentle on the RAM, peaking at only 65 GB.The only slight disappointment was the Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit, which crawled at 14 - 23 tokens/sec. OP also wanted to test the Qwen 35B but forgot to download it. Classic.
With over 1.3k upvotes, the post blew up, and the LocalLLaMA Discord went nuts. But while OP was fighting with Python packages, the comment section was doing its thing:
No_Afternoon_4260 brought the sarcasm early: "Been 10 minutes, where are the benchmarks? /S". Another chimed in: "Its already 14min without benchmarks. What is OP even doing".sammcj was eagerly waiting for the 27B model numbers, crying in the corner because "Mine arrives in two weeks!".Beyond seeing the ridiculous capabilities of Apple's Unified Memory architecture (which makes running 100B+ parameter models locally actually viable), there's a vital lesson here.
Always double-check your tooling. OP almost published garbage benchmarks just because BatchGenerator wasn't playing nice. If your numbers look weird, don't blame the silicon immediately—check your packages, your environment, and your code.
The M5 Max is clearly a beast for local AI. If you have the budget, go nuts. As for the rest of us mere mortals, we'll just keep paying for API calls and crying in 16GB RAM.
Source: Reddit - r/LocalLLaMA