Vozo just dropped Visual Translate, an AI tool that translates on-screen text in videos. Say goodbye to manual After Effects tracking, but does it actually work?

If you've ever dealt with video localization, you know the drill. Subtitles? A breeze. Dubbing? Solved. But editing the actual hardcoded text inside the video (like slides, diagrams, or UI labels)? Absolute hell. Firing up After Effects, motion tracking, masking old text, adjusting fonts, and rendering... it's a massive pain in the ass that eats up RAM and your soul.
But today, the Product Hunt hivemind is buzzing about a new drop: Visual Translate by Vozo. They claim to have solved the "final missing layer" of video translation. Sounds like some dark magic, so let's break it down.
Vozo was founded by CY, an ex-Google researcher who worked on core video tech for Android. After nailing AI dubbing and lip-sync, the team realized a massive chunk of context—especially in explainer or training videos—lives right there in the visuals.
Here’s what the AI does: It scans the frame -> rips out the existing text -> translates it -> and reconstructs it back into the video while preserving the original layout, style, and animation.
But here’s the real MVP feature: It's EDITABLE. They don't just spit out a finalized MP4 and tell you to deal with it. Anyone who has worked with AI knows it hallucinates or writes weird phrasing. Giving users an editor to tweak the translated text is a massive big brain move.
Shoutout to Naro, the engineer who built the initial prototype back in October. It’s always the solo dev cooking up a rough PoC in a dark room that ends up pivoting the whole company roadmap. Respect.
Scrolling through the comments, folks are generally hyped, but they brought up some solid edge cases:
1. Video editors shedding tears of joy: One user nailed it: "This could save a lot of manual After Effects work." While the tech world is hyper-fixated on full Text to Video AI generation from scratch, Vozo built a highly practical tool that solves a real, agonizing workflow problem.
2. The "CSS Overflow" dilemma in videos:
Someone asked the golden question: "What happens when the translated text is longer?" (e.g., English to German). Vozo claims the system auto-computes a new layout, adjusts font sizes, and handles line breaks to keep it clean. As devs who have fought CSS overflow: hidden for years, we’ll believe it when we see it under stress.
3. RTL Languages (Hebrew/Arabic)? Not today. A user asked about Hebrew. The dev team gave an honest, refreshing answer: RTL isn't supported yet because it's not just about right-aligning text; it often requires flipping the entire visual layout (progress bars, arrows, UI components). No marketing fluff, just straight-up "it's too hard right now, but we're working on it."
From a dev perspective, the biggest takeaway from Vozo's launch is their philosophy: Editability > Full Generation.
Tech bros love building AI tools that generate a final product with a single prompt. But in the real world, professional users hate locked black boxes. They want tools that automate the tedious grunt work but still leave them the steering wheel.
Lesson learned: Don't try to use AI to replace the human entirely. Use AI to do the boring tracking/masking crap, and let the human hit "Save & Export". That's how you actually build a SaaS people will pay for.