Instructions to use zhangj1an/kimi_audio_7b_random with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- KimiAudio
How to use zhangj1an/kimi_audio_7b_random with KimiAudio:
# Example usage for KimiAudio # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git from kimia_infer.api.kimia import KimiAudio model = KimiAudio(model_path="zhangj1an/kimi_audio_7b_random", load_detokenizer=True) sampling_params = { "audio_temperature": 0.8, "audio_top_k": 10, "text_temperature": 0.0, "text_top_k": 5, } # For ASR asr_audio = "asr_example.wav" messages_asr = [ {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"}, {"role": "user", "message_type": "audio", "content": asr_audio} ] _, text = model.generate(messages_asr, **sampling_params, output_type="text") print(text) # For Q&A qa_audio = "qa_example.wav" messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}] wav, text = model.generate(messages_conv, **sampling_params, output_type="both") sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000) print(text) - Notebooks
- Google Colab
- Kaggle
Kimi-Audio random / test fixture
Tiny random-init bundle of Kimi-Audio-7B-Instruct
for vLLM-Omni's L1/L2 core_model CI tests.
Verifies the full pipeline end-to-end without paying the ~42 GB checkpoint cost.
It follows the same on-disk schema as upstream, but every transformer-style component has shrunk dimensions and random weights:
| Component | File | Upstream | Random |
|---|---|---|---|
| LM (Qwen-2-style + MIMO) | model.safetensors |
16 GB sharded | 555 MB (single shard) |
| Whisper encoder | whisper-large-v3/model.safetensors |
3 GB | 17 MB (encoder only) |
| Audio detokenizer (FM DiT) | audio_detokenizer/model.pt |
19 GB | 35 MB |
Shrunk dims (token IDs / vocab sizes kept at upstream values):
- LM:
hidden_size 3584β512,num_hidden_layers 28β4,num_attention_heads 28β8,intermediate_size 18944β1536,kimia_mimo_layers 6β2,kimia_mimo_transformer_from_layer_index 21β2,kimia_adaptor_input_dim 5120β1536 - Whisper:
d_model 1280β384,encoder_layers 32β4,encoder_ffn_dim 5120β1536,encoder_attention_heads 20β6(decoder weights dropped β vLLM only uses the encoder) - FM DiT:
hidden_size 2304β384,depth 16β4,num_heads 18β6,condition_input_dim 1280β384
The bundle does not ship a vocoder/ subdir β KimiBigVGAN loads from
zhangj1an/kimi-audio-bigvgan-hf at runtime.
modeling_moonshot_kimia.py was patched to stub flash_attn symbols (instead of raising)
so AutoModelForCausalLM.from_config(trust_remote_code=True) works in CI without
flash_attn installed; vLLM-Omni replaces the attention impl anyway.
Do not use for actual generation β outputs are noise.
- Downloads last month
- 343