Ministral-3-8B-Instruct-2512 text-only NVFP4

This repository contains a text-only and NVFP4-quantized derivative of:

mistralai/Ministral-3-8B-Instruct-2512-BF16

What this is

The original upstream checkpoint is multimodal. This repo contains:

the text-only language model extracted from the original checkpoint
quantized to full NVFP4
saved in compressed-tensors format
adjusted for vLLM loading

I'm running this with about 32 token/s per request in vllm using the openai style api on a GB10 based machine.

The Hugging Face UI may report unusual parameter and tensor-type information for this model. That is expected for compressed NVFP4 artifacts and should not be interpreted as the original dense model having changed to a literal 5B FP32/BF16 model.

Important notes

This is not the original upstream model.
This is not multimodal. Vision/image support was removed during extraction.
This repo is intended primarily for vLLM serving.
The config.json was patched to use:

"architectures": ["MistralForCausalLM"]

Changes:

2026-03-11 first version was reluctant to call tools, calibrated new, its calling tools fine now

Downloads last month: 116

Safetensors

Model size

5B params

Tensor type

F32

BF16

F8_E4M3

Model tree for ppetermann/Ministral-3-8B-Instruct-2512-textonly-NVFP4

Base model

mistralai/Ministral-3-8B-Base-2512

Finetuned

mistralai/Ministral-3-8B-Instruct-2512-BF16

Quantized

(11)

this model