A two-person team at Nari Labs has just dropped Dia, a powerful new open source text-to-speech model—and it’s turning heads across the AI landscape. Built with zero funding and trained on Google TPUs via the Research Cloud, Dia aims to beat industry leaders like ElevenLabs, OpenAI’s gpt-4o-mini-tts, and Google’s NotebookLM podcast tool. What sets it apart? Natural dialogue delivery, emotional tone, and the ability to interpret nonverbal cues like laughter and coughing—all from plain text.
Openly available on GitHub and Hugging Face under an Apache 2.0 license, Dia is already making waves in audio generation communities with example demos showing it outperforming proprietary models. With advanced voice control, voice cloning, and expressive performance baked in, this model isn’t just for coders—it’s built for anyone who wants to generate high-quality spoken content.
Key Points
Dia is a 1.6B parameter text-to-speech (TTS) model that rivals and, in many cases, outperforms competitors like ElevenLabs Studio, Sesame CSM-1B, and NotebookLM’s podcast generator.
Built by just two engineers at Nari Labs, the model was created with no external funding and is fully open source.
Dia supports emotional tones, speaker tagging, and nonverbal audio cues (like laughs or coughs), giving it a lifelike, conversational edge.
Side-by-side comparisons show Dia’s superior ability to handle rhythmic complexity, emotional transitions, and expressive delivery.
Voice cloning and prompt-based tone matching are included, expanding creative and commercial applications.
Available under Apache 2.0 license, making it legally viable for commercial use.
Nari Labs explicitly bans misuse, including impersonation, misinformation, and illegal activity.
Developers can try it with a Gradio-based demo, and a consumer-friendly remix tool is on the way.
Key Quotes
“Dia rivals NotebookLM’s podcast feature while surpassing ElevenLabs Studio and Sesame’s open model in quality.” – Toby Kim, Co-creator of Dia
“We were not AI experts from the beginning… None of them sounded like real human conversation.” – Kim, recounting their motivation
“Dia interprets and delivers actual laughter, whereas ElevenLabs and Sesame output textual substitutions like ‘haha’.” – From Nari Labs’ audio comparison tests
“We wanted more—more control over the voices, more freedom in the script.” – Toby Kim on why they built Dia
Implications
Dia represents a significant moment for open source voice tech. It’s a rare example of a small, grassroots AI team challenging the dominance of major players with a model that combines both accessibility and expressiveness. By making it commercially usable out of the box and prioritizing responsible development, Nari Labs is positioning Dia as a go-to solution for developers, educators, content creators, and more.
This isn’t just about better voice synthesis—it’s about democratizing the ability to create digital voices that feel human. And if this is what two people can build without funding, the ripple effects across AI development could be massive.
AISQ | Squirrly created this web Customer App for all of you who own licenses for AISQ’s Next Level Marketing AI, AISQBusiness, Squirrly SEO, Hide My WP Ghost and more.