Coding4Food LogoCoding4Food
HomeCategoriesArcadeBookmarks
vi
HomeCategoriesArcadeBookmarks
Coding4Food LogoCoding4Food
HomeCategoriesArcadeBookmarks
Privacy|Terms

© 2026 Coding4Food. Written by devs, for devs.

All news
TechnologyAI & Automation

M5 Max 128GB Put to the Local LLM Test: A Python Venv Nightmare and Raw Benchmarks

March 12, 20263 min read

A Redditor got the M5 Max 128GB and tortured it with massive Local LLMs. See the raw MLX benchmarks, the RAM-hogging stats, and the dev drama behind it.

Share this post:
board, electronics, computer, electrical engineering, current, printed circuit board, data, cpu, circuits, chip, technology, control center, solder joint, riser board, computer science, microprocessor, electronics, computer, computer, technology, technology, technology, technology, technology
Nguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmarkNguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmark
Nguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmarkNguồn gốc: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/m5-max-128gb-local-llm-benchmark. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/m5-max-128gb-local-llm-benchmark
m5 max benchmarklocalllamaapple silicon chạy aimlx_lmtest llm m5 maxqwen3.5
Share this post:

Bình luận

Related posts

ai generated, server, data centre, computer, rack, digital, processor, technology, modern art, server, server, server, server, server
TechnologyAI & Automation

MiniMax M2.7 Released: A Brutal VRAM Reality Check for the GPU-Poor

MiniMax M2.7 just dropped on HuggingFace, sparking a massive VRAM panic and non-commercial license drama on r/LocalLLaMA. Here is the pragmatic dev breakdown.

Apr 123 min read
Read more →
soap bubble, frost bubble, ice crystals, frozen, winter, cold, bubble, backlighting, freeze, winter, winter, winter, winter, winter, bubble
AI & AutomationTechnology

The Hilarious State of Local LLaMA: Sycophant Bots and Concrete Banana Bread

Dive into the recent r/LocalLLaMA thread exposing the wild state of local AI models. Expect wild hallucinations, corporate bot talk, and 'MoE bread'.

Apr 103 min read
Read more →
ai generated, artificial intelligence, brain, robot, ai, machine, cyber brain, iot, web3, iot, iot, iot, iot, iot
AI & AutomationTechnology

Google's Gemma 4 Launch: Blood, Sweat, Bugs, and Reddit Conspiracy Theories

The truth behind Google DeepMind's Gemma 4 launch. A massive dev effort meets reality as r/LocalLLaMA users report unclosed tags, endless loops, and missing models.

Apr 73 min read
Read more →
ai generated, ai, microchip, artificial intelligence, robot, technology, digital, computer science, future, digitization, futuristic, network, communication, data, web, cyborg, computer, information, data exchange, robotics, internet, processor
AI & AutomationTechnology

Qwen 3.5 Mini Drops: Christmas Came Early for the Potato GPU Squad

Qwen 3.5 just dropped its small variants, and the benchmarks are insane. Broke devs with potato PCs are celebrating, while big GPU owners are confused.

Mar 32 min read
Read more →
airplane, plane, lufthansa, 747, airport, frankfurt, jet, germany, airplane, airplane, airplane, airplane, airplane, plane, plane, plane, plane, lufthansa, airport, airport, airport, airport
AI & AutomationIT Drama

Alibaba's Massive Qwen Ad at Changi Airport: Big Tech Flexing in the Wild

Alibaba is plastering Qwen ads at airports now. Reddit's r/LocalLLaMA weighs in on the open-source hype, enshittification, and ordering takeout with AI.

Mar 223 min read
Read more →
rose, beautiful flowers, bicolored flower, bicolored rose, petals, blossom, rose flower, bloom, flower, flora, floriculture, horticulture, botany, nature, rose petals, plant, flowering plant, single rose, single flower, floribunda, rose bloom, flower background, flower wallpaper, close up
Dev LifeAI & Automation

Getting Roasted by the 'Vibe Coding' Trend: Building AI Apps for an Audience of One

Tech Reddit is melting down over 'Vibe Coding': spending nights building fancy AI apps only to realize you are the sole user. C4F dives into the hilarious drama.

Mar 143 min read
Read more →

We've all been hearing the whispers about the Apple M5 Max, but the waiting game is over. A madlad on Reddit going by cryingneko just got their hands on the M5 Max 14-inch with a beefy 128GB of RAM and immediately decided to torture test it with massive Local LLMs. Why bother setting up a cloud vps when you can literally melt your new shiny laptop, right?

The "Hold My Beer" Moment and the Python Venv Curse

OP came in hot, promising raw numbers, no fluff, no 20-minute YouTube video telling you to hit subscribe. Just straight-up benchmarks. But as any dev knows, the universe hates a cocky programmer.

The numbers got delayed. Why? Because OP initially ran the tests using BatchGenerator, and the token generation speeds were absolute garbage. Instead of posting bogus stats, OP did what any sane developer would do: panicked, trashed the setup, spun up a pristine fresh Python virtual environment, and re-ran everything using pure mlx_lm with stream_generate.

Moral of the story: Your $4,000 machine is only as fast as your spaghetti code and the dependencies you blindly pip install.

The RAM-Gobbling Numbers

Once the environment was sorted, OP dropped the logs. Here's what happens when you push the M5 Max to its limits with AI models:

  • Qwen3.5-122B-A10B-4bit: This beast casually chewed through 76.397 GB of peak memory. Prompt processing went brrr at over 1239 tokens/sec, while generation hovered steadily between 54 - 65 tokens/sec.
  • Qwen3-Coder-Next-8bit: Say goodbye to your RAM. This model peaked at 92.605 GB when dealing with a 65k context window. Prompt speeds spiked to 1887 tokens/sec, but generation dropped to 48 - 79 tokens/sec depending on the load.
  • gpt-oss-120b-MXFP4-Q8: The absolute speed demon of the bunch. It processed prompts at an insane 2710 tokens/sec. Generation was smooth at 64 - 87 t/s, and surprisingly, it was gentle on the RAM, peaking at only 65 GB.

The only slight disappointment was the Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit, which crawled at 14 - 23 tokens/sec. OP also wanted to test the Qwen 35B but forgot to download it. Classic.

The Reddit Peanut Gallery Reacts

With over 1.3k upvotes, the post blew up, and the LocalLLaMA Discord went nuts. But while OP was fighting with Python packages, the comment section was doing its thing:

  • The Impatient Ones: User No_Afternoon_4260 brought the sarcasm early: "Been 10 minutes, where are the benchmarks? /S". Another chimed in: "Its already 14min without benchmarks. What is OP even doing".
  • The Copium Inhalers: sammcj was eagerly waiting for the 27B model numbers, crying in the corner because "Mine arrives in two weeks!".

The Senior Dev Takeaway

Beyond seeing the ridiculous capabilities of Apple's Unified Memory architecture (which makes running 100B+ parameter models locally actually viable), there's a vital lesson here.

Always double-check your tooling. OP almost published garbage benchmarks just because BatchGenerator wasn't playing nice. If your numbers look weird, don't blame the silicon immediately—check your packages, your environment, and your code.

The M5 Max is clearly a beast for local AI. If you have the budget, go nuts. As for the rest of us mere mortals, we'll just keep paying for API calls and crying in 16GB RAM.

Source: Reddit - r/LocalLLaMA