Ministral-3-8B-Instruct-2512 text-only NVFP4
This repository contains a text-only and NVFP4-quantized derivative of:
What this is
The original upstream checkpoint is multimodal. This repo contains:
- the text-only language model extracted from the original checkpoint
- quantized to full NVFP4
- saved in compressed-tensors format
- adjusted for vLLM loading
I'm running this with about 32 token/s per request in vllm using the openai style api on a GB10 based machine.
The Hugging Face UI may report unusual parameter and tensor-type information for this model. That is expected for compressed NVFP4 artifacts and should not be interpreted as the original dense model having changed to a literal 5B FP32/BF16 model.
Important notes
- This is not the original upstream model.
- This is not multimodal. Vision/image support was removed during extraction.
- This repo is intended primarily for vLLM serving.
- The
config.jsonwas patched to use:
"architectures": ["MistralForCausalLM"]
Changes:
- 2026-03-11 first version was reluctant to call tools, calibrated new, its calling tools fine now
- Downloads last month
- 116
Model tree for ppetermann/Ministral-3-8B-Instruct-2512-textonly-NVFP4
Base model
mistralai/Ministral-3-8B-Base-2512