Svara Mimi Indic v3

Mimi audio codec fine-tuned on 10 Indian languages (IndicVoices-R).

Training

  • Base model: kyutai/mimi
  • Fine-tuned: encoder + semantic codebook (codebook 0)
  • Decoder: frozen
  • Loss: multi-scale mel reconstruction + WavLM distillation
  • Data: IndicVoices-R, 10k samples/language, 10 languages
  • Epochs: 10, batch_size: 8, lr: 1e-4

Usage

from svara.codec.mimi import MimiCodec
import torch

codec = MimiCodec.from_pretrained("kyutai/mimi", dtype=torch.bfloat16)
state = torch.load("mimi_final.pt", map_location="cpu")
codec.model.load_state_dict(state, strict=False)

Files

  • mimi_final.pt โ€” final checkpoint (full model state_dict)
  • mimi_step*.pt โ€” intermediate checkpoints
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support