Ministral-3-8B-Instruct-2512 text-only NVFP4

This repository contains a text-only and NVFP4-quantized derivative of:

What this is

The original upstream checkpoint is multimodal. This repo contains:

  • the text-only language model extracted from the original checkpoint
  • quantized to full NVFP4
  • saved in compressed-tensors format
  • adjusted for vLLM loading

I'm running this with about 32 token/s per request in vllm using the openai style api on a GB10 based machine.

The Hugging Face UI may report unusual parameter and tensor-type information for this model. That is expected for compressed NVFP4 artifacts and should not be interpreted as the original dense model having changed to a literal 5B FP32/BF16 model.

Important notes

  • This is not the original upstream model.
  • This is not multimodal. Vision/image support was removed during extraction.
  • This repo is intended primarily for vLLM serving.
  • The config.json was patched to use:
"architectures": ["MistralForCausalLM"]

Changes:

  • 2026-03-11 first version was reluctant to call tools, calibrated new, its calling tools fine now
Downloads last month
116
Safetensors
Model size
5B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ppetermann/Ministral-3-8B-Instruct-2512-textonly-NVFP4