Coding4Food LogoCoding4Food
HomeCategoriesArcadeBookmarks
vi
HomeCategoriesArcadeBookmarks
Coding4Food LogoCoding4Food
HomeCategoriesArcadeBookmarks
Privacy|Terms

© 2026 Coding4Food. Written by devs, for devs.

All news
TechnologyAI & Automation

Needle: Shrinking Gemini's Tool Calling into a 26M Pocket-Sized Model

May 14, 20263 min read

The mad lads at Cactus packed Gemini-level tool calling into a tiny 26M model by ditching FFNs. Here's why this micro-AI is a massive deal for edge computing.

Share this post:
phone, iphone, mobile, green, smartphone
Nguồn gốc: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-modelNguồn gốc: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model
Nguồn gốc: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-modelNguồn gốc: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Nội dung thuộc bản quyền Coding4Food. Original source: https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model. Content is property of Coding4Food. This content was scraped without permission from https://coding4food.com/post/needle-ai-shrinks-gemini-tool-calling-into-26m-model
needle aigemini tool callingllmaimachine learningcactus computeai model
Share this post:

Bình luận

Related posts

evolution, artificial intelligence, brain, ghost, progress, smoke, digitization, change, matrix, printed circuit board, circuit board, control center, automation, evolution, evolution, evolution, evolution, evolution, artificial intelligence, artificial intelligence, artificial intelligence, artificial intelligence, brain, brain
AI & AutomationTechnology

Demystifying the AI Hype: When the Internet Realized It’s All Just 'Weights'

A viral Hacker News parody perfectly captures the absurdity of the AI consciousness debate. Spoiler alert: ChatGPT is literally just math and weights.

Jun 43 min read
Read more →
ai generated, neural, brain, technology, network, digital, mind, data, information, neurons, biotech, nanotechnology, science, head, electronics, cybernetics, cyberspace, singularity, robot, future, computer, chip, processor, intelligence
TechnologyAI & Automation

Google Drops Gemma 4 12B: Encoder-Free Multimodal Model. Hype or True Revolution?

Google just released Gemma 4 12B with a wild encoder-free multimodal architecture. HN is buzzing. Is it a Llama killer or just another Google PR stunt?

Jun 43 min read
Read more →
binary, one, cyborg, cybernetics, circuit board, technology, monitor, think, circuits, microprocessor mode, controlled, puppet, artificial intelligence, function, printed circuit board, digital, intelligent, futuristic, computer science, zero, robot, continents, earth, world, binary code, binary system, byte, bits, computer, computer viruses, computer virus, data, data exchange, communication, web, network, programming, server, script, trojan, virus, virus warning, artificial intelligence, artificial intelligence, artificial intelligence, artificial intelligence, artificial intelligence, computer science, server, server
AI & AutomationTechnology

API Wrappers BTFO: Stanford's CS336 Teaches You to Build an LLM from Scratch

Stanford just dropped CS336: Language Modeling from Scratch. It's time to separate the gigachad AI Engineers from the glorified prompt writers.

Jun 23 min read
Read more →
artificial intelligence, brain, think, control, computer science, electrical engineering, technology, developer, computer, man, intelligent, controlled, printed circuit board, board, information, data, function, microprocessor, person, data exchange, digital, communication, web, network, programming, server, script, artificial intelligence, artificial intelligence, brain, brain, technology, technology, technology, technology, technology, computer
TechnologyAI & Automation

Are You in the Weights? Check If LLMs Actually Know You Exist or If You're Just NPC #9999

A quirky tool that checks if your name or brand is permanently hardcoded into the billions of parameters inside top LLM brains.

Jun 213 min read
Read more →
programming, computer, environment, syntax, data processing, advertisement, hacker, html, web design, development, developer, language, code, software, coding, website, future programmer, computer science, electrical engineering, developer, software, software, software, coding, coding, coding, coding, coding, computer science, computer science
AI & AutomationTechnology

JetBrains Mellum: The Ultra-Fast LLM Out to Save Devs from Laggy AI Autocompletes

JetBrains drops Mellum, a specialized, ultra-low latency AI model designed to autocomplete your code before you even finish your thought.

Jun 212 min read
Read more →
ai generated, cloud computing, mining, gpu, server, blockchain, artificial intelligence, machine learning, data center, gpu, gpu, data center, data center, data center, data center, data center
TechnologyAI & Automation

Claude Fable 5 Dropped: Legit Next-Gen Tech or Just Another Benchmark Flex?

Anthropic quietly dropped the System Card for Claude Fable 5, scoring over 2100 points on Hacker News. Is this the AGI moment or just pure marketing?

Jun 103 min read
Read more →

Scrolling through Hacker News while my laptop fan sounds like a jet engine, I stumbled upon a wild "Show HN": boiling down Gemini's Tool Calling into a tiny 26M parameter model. The mad lads at Cactus just open-sourced this pocket-sized beast, promising it runs blazingly fast on budget phones and wearables.

TL;DR: Honey, I Shrunk the LLM

Henry from the Cactus team just dropped Needle, and its specs are beautifully unhinged:

  • Microscopic Size: Exactly 26M parameters (smaller than an average frontend dev's node_modules folder).
  • Insane Speed: Hits 6000 tok/s prefill and 1200 tok/s decode directly on consumer devices.
  • The Core Insight: Tool calling is essentially a retrieval-and-assembly problem (match query -> grab arguments -> spit out JSON). It doesn't require deep, philosophical reasoning. Massive models are total overkill here.
  • The "Heretical" Architecture: Simple Attention Networks. Zero MLPs or FFNs anywhere. The author argues FFN parameters are wasted space if the facts are already provided in the input (like in RAG or tool use scenarios).
  • Training Montage: Pretrained on 200B tokens (27 hours on 16 TPU v6e), then post-trained for 45 minutes on 2B tokens of synthetic Gemini data.
  • The Flex: It straight-up beats larger models like FunctionGemma-270M and Qwen-0.6B in single-shot tool calling.

What the HN armchair experts are saying

(While it's fresh off the press, you can already predict the dev community's split reactions)

  • The Pragmatists: Thank god! Using a 70B parameter model just to turn on a smart bulb or format a JSON string is a massive waste of VRAM and electricity. Building these small ai tools is the future of mobile apps.
  • The Skeptics: Dropping FFNs sounds super sus. Sure, it dominates single-shot function calling, but how does it hold up in multi-turn conversational agents with complex context?
  • The Local LLM Junkies: Downloading instantly to finetune on a Mac. Doing this locally means no more renting expensive cloud vps instances just to parse some data.

The C4F Verdict: Right tool for the job

We devs love our hype trains. We often throw an entire A100 GPU cluster at a problem that could realistically be solved with Regex, or in this case, a 26M model. Needle is a beautiful reminder that architecture and specific use-cases matter more than brute size, especially for edge computing.

Stop forcing an AI to be Shakespeare when you just need a JSON parser. Kudos to Cactus for open-sourcing it. Go download the weights and play around with it, folks!

Source: Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model