Nari Labs is a South Korean startup making significant strides in the field of text-to-speech (TTS) technology. Founded by two undergraduate students, including Toby Kim, the company has developed Dia, a 1.6 billion parameter open-source TTS model designed to generate ultra-realistic dialogue from text transcripts. Released under the Apache 2.0 license, Dia aims to rival proprietary models like ElevenLabs, Google's NotebookLM, and Sesame.
Dia stands out for its ability to produce natural-sounding conversations with customizable speaker tones, emotional inflections, and nonverbal cues such as laughter, coughs, and sighs. This level of expressiveness creates immersive audio experiences suitable for applications in virtual assistants, gaming, audiobooks, and accessibility tools.
One of Dia's notable features is its zero-shot voice cloning capability, allowing it to replicate a speaker's voice from just seconds of reference audio. This breakthrough makes personalized voice experiences more accessible for developers and content creators, enabling custom voice assistants and localized content with minimal setup time.
As an open-source project, Dia is available on GitHub and Hugging Face, complete with pretrained checkpoints, inference code, and a Gradio-based demo for easy testing. This approach encourages community contributions and accelerates innovation in voice technology.