Svara Mimi Indic v3
Mimi audio codec fine-tuned on 10 Indian languages (IndicVoices-R).
Training
- Base model: kyutai/mimi
- Fine-tuned: encoder + semantic codebook (codebook 0)
- Decoder: frozen
- Loss: multi-scale mel reconstruction + WavLM distillation
- Data: IndicVoices-R, 10k samples/language, 10 languages
- Epochs: 10, batch_size: 8, lr: 1e-4
Usage
from svara.codec.mimi import MimiCodec
import torch
codec = MimiCodec.from_pretrained("kyutai/mimi", dtype=torch.bfloat16)
state = torch.load("mimi_final.pt", map_location="cpu")
codec.model.load_state_dict(state, strict=False)
Files
mimi_final.ptโ final checkpoint (full model state_dict)mimi_step*.ptโ intermediate checkpoints
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support