Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-System Speech Language Mannequin with Immediate Voice Cloning

October 3, 2025

30

Neuphonic has launched NeuTTS Air, an open-source text-to-speech (TTS) speech language mannequin designed to run regionally in actual time on CPUs. The Hugging Face mannequin card lists 748M parameters (Qwen2 structure) and ships in GGUF quantizations (This autumn/Q8), enabling inference by llama.cpp/llama-cpp-python with out cloud dependencies. It’s licensed beneath Apache-2.0 and features a runnable demo and examples.

So, what’s new?

NeuTTS Air {couples} a 0.5B-class Qwen spine with Neuphonic’s NeuCodec audio codec. Neuphonic positions the system as a “super-realistic, on-device” TTS LM that clones a voice from ~3 seconds of reference audio and synthesizes speech in that fashion, focusing on voice brokers and privacy-sensitive functions. The mannequin card and repository explicitly emphasize real-time CPU era and small-footprint deployment.

Key Options

Realism at sub-1B scale: Human-like prosody and timbre preservation for a ~0.7B (Qwen2-class) text-to-speech LM.
On-device deployment: Distributed in GGUF (This autumn/Q8) with CPU-first paths; appropriate for laptops, telephones, and Raspberry Pi-class boards.
Immediate speaker cloning: Model switch from ~3 seconds of reference audio (reference WAV + transcript).
Compact LM+codec stack: Qwen 0.5B spine paired with NeuCodec (0.8 kbps / 24 kHz) to steadiness latency, footprint, and output high quality.

Clarify the mannequin structure and runtime path?

Spine: Qwen 0.5B used as a light-weight LM to situation speech era; the hosted artifact is reported as 748M params beneath the qwen2 structure on Hugging Face.
Codec: NeuCodec gives low-bitrate acoustic tokenization/decoding; it targets 0.8 kbps with 24 kHz output, enabling compact representations for environment friendly on-device use.
Quantization & format: Prebuilt GGUF backbones (This autumn/Q8) can be found; the repo consists of directions for llama-cpp-python and an optionally available ONNX decoder path.
Dependencies: Makes use of espeak for phonemization; examples and a Jupyter pocket book are supplied for end-to-end synthesis.

On-device efficiency focus

NeuTTS Air showcases ‘real-time era on mid-range gadgets‘ and affords CPU-first defaults; GGUF quantization is meant for laptops and single-board computer systems. Whereas no fps/RTF numbers are revealed on the cardboard, the distribution targets native inference with out a GPU and demonstrates a working circulation by the supplied examples and Area.

Voice cloning workflow

NeuTTS Air requires (1) a reference WAV and (2) the transcript textual content for that reference. It encodes the reference to fashion tokens after which synthesizes arbitrary textual content within the reference speaker’s timbre. The Neuphonic workforce recommends 3–15 s clear, mono audio and gives pre-encoded samples.

Privateness, accountability, and watermarking

Neuphonic frames the mannequin for on-device privateness (no audio/textual content leaves the machine with out person’s approval) and notes that each one generated audio features a Perth (Perceptual Threshold) watermarker to assist accountable use and provenance.

The way it compares?

Open, native TTS programs exist (e.g., GGUF-based pipelines), however NeuTTS Air is notable for packaging a small LM + neural codec with on the spot cloning, CPU-first quantizations, and watermarking beneath a permissive license. The “world’s first super-realistic, on-device speech LM” phrasing is the seller’s declare; the verifiable info are the measurement, codecs, cloning process, license, and supplied runtimes.

The main target is on system trade-offs: a ~0.7B Qwen-class spine with GGUF quantization paired with NeuCodec at 0.8 kbps/24 kHz is a realistic recipe for real-time, CPU-only TTS that preserves timbre utilizing ~3–15 s fashion references whereas preserving latency and reminiscence predictable. The Apache-2.0 licensing and built-in watermarking are deployment-friendly, however publishing RTF/latency on commodity CPUs and cloning-quality vs. reference-length curves would allow rigorous benchmarking towards current native pipelines. Operationally, an offline path with minimal dependencies (eSpeak, llama.cpp/ONNX) lowers privateness/compliance danger for edge brokers with out sacrificing intelligibility.

Try the Mannequin Card on Hugging Face and GitHub Web page. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be part of us on telegram as properly.

Michal Sutter is a knowledge science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking advanced datasets into actionable insights.

Previous articleFlying drones in San Francisco? One of the best spots to fly legally

Next articleGoogle Rolls Out Emoji Reply Search Field

Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-System Speech Language Mannequin with Immediate Voice Cloning

So, what’s new?

Key Options

Clarify the mannequin structure and runtime path?

On-device efficiency focus

Voice cloning workflow

Privateness, accountability, and watermarking

The way it compares?

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

How KV Caching Makes Fashionable LLMs Quick?

Podcast: Is the related automobile revolution lastly right here, or are we nonetheless caught in impartial?

Temu expands European supply community

Saving password in passwords app is NOT working if I’ve password and ensure password textfield Swift IOS 26

Recent Comments

ABOUT US

POPULAR POSTS

How KV Caching Makes Fashionable LLMs Quick?

Podcast: Is the related automobile revolution lastly right here, or are we nonetheless caught in impartial?

Temu expands European supply community

POPULAR CATEGORY