Using an LLM as a judge is burning your wallet? Check out Plurai's "vibe-training"—a clever way to build SLM guardrails that are 8x cheaper and sub-100ms.

AI is getting freakishly smart, but it still does dumb sh*t sometimes, making devs lose sleep over writing guardrails to stop their agents from going rogue. Recently, a startup called Plurai dropped a bombshell on Product Hunt, bringing a whole new cult to the tech world: "Vibe-training."
Grab a coffee, let’s break down what the hell is actually going on.
The whole thing started when the Plurai team decided to tackle a massive, throbbing pain point for AI devs: evaluating LLMs is slow, expensive, and a total pain in the a**.
The dev community is having a field day with this launch, and the comment section is a goldmine.
The Hype Train: People are absolutely loving the term "vibe-training." It’s catchy. One user jokingly asked if this tool could stop their AI agent from buying overpriced guru courses online. The founder confidently replied: "Yes, and more!"
The Procrastinators: Many devs related hard to the founder's pitch. We’ve all been there—eval pipelines always get pushed from Q3 to Q4, and eventually become shelfware because nobody wants to manually label data.
The Pragmatic Skeptic (The Real MVP): A veteran dev named Sebastian hit them with a hard reality check: "When the SLM and the original LLM judge disagree in production, who do you trust? How do you surface that? That's usually where these systems become shelfware."
The reply from Plurai’s team was an absolute mic drop. They explained they aren't using vanilla BARRED (their base research). Instead, they combine it with AutoPrompt. Their philosophy? You don't resolve disagreements in production; you resolve them during training. By asking the user to label just a few edge cases upfront, they align the judges to the user's actual intent. If a disagreement happens in production, it's treated as a high-value edge case and fed straight back into the debate loop.
Essentially: it learns by arguing. Badass.
TL;DR: Plurai looks like a legit game-changer, and "vibe-training" is a masterclass in dev-focused marketing.
The survival lesson here? Stop trying to use a bazooka to kill a mosquito. Using massive LLMs for every single guardrail check is a fast track to burning through your VC money. Utilize smaller, fine-tuned models (SLMs) instead.
Before deploying your next big agent on a cheap vps for the world to see, take evaluation seriously. If you don't, your rogue agent might start insulting users or leaking data, and guess whose head will be on the chopping block? Yep, yours.
Source: Plurai on Product Hunt