Pocket TTS β€” GGUF Q8_0

Q8_0 quantized version of kyutai/pocket-tts-without-voice-cloning in GGUF format, for browser-based TTS inference via WebAssembly.

Try the demo β†’

Model Details

Original GGUF Q8_0
File tts_b6369a24.safetensors pocket-tts-q8_0.gguf
Size 236 MB (BF16) 128 MB
Format safetensors GGUF
Reduction β€” 46%

What's included

This GGUF contains the TTS decoder pipeline only: the transformer backbone, flow matching network, mimi decoder + decoder transformer, and the DummyQuantizer output projection.

The mimi encoder (SEANet encoder, encoder transformer, downsample conv) is excluded β€” TTS only needs the decoder path. This saves ~52 MB (28%) compared to a full-model GGUF.

Quantization

Per-block Q8_0 quantization (block size 32): 2-byte f16 scale + 32 int8 values per block.

56 tensors quantized β€” all linear/projection weights in the transformer backbone, flow matching network, and mimi decoder transformer.

114 tensors kept as F32 β€” norms, biases, embeddings, SEANet decoder convolutions, quantizer, and resampling convolutions.

Validation SQNR: >40 dB on all tensors.

Runtime

Weights stay quantized as Q8_0 at runtime. Matmuls use a tiled WASM SIMD128 quantized matmul kernel (fork of candle) β€” achieving ~2x realtime on desktop (M-series Mac, Chrome).

Files

File Size Description
pocket-tts-q8_0.gguf 128 MB Model weights (Q8_0 + F32, decoder only)
tokenizer.model 58 KB SentencePiece unigram tokenizer

Voice embeddings are unchanged β€” use them from the original repo.

Usage

This model is designed for use with tts-web, a browser-based TTS engine built with Candle and WebAssembly.

Acknowledgments

Based on Kyutai's Pocket TTS β€” a 100M parameter text-to-speech model.

Disclaimer

This is an independent port by idle intelligence, not affiliated with or endorsed by Kyutai Labs.

License

CC-BY-4.0 (same as the original model).

Downloads last month
63
GGUF
Model size
0.1B params
Architecture
pocket-tts
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for idle-intelligence/pocket-tts-gguf

Quantized
(2)
this model