Qwen3.5-35B-A3B-NVFP4

This is a quantized version of Qwen/Qwen3.5-35B-A3B. This model accepts text and images as inputs and generates text as outputs. The weights and activations were quantized to FP4 using llm-compressor, reducing the model size from 67.0 GB to 21.8 GB (~3.1x reduction) while maintaining 98.8% average accuracy recovery.

Inference

As of 2/27/2026, this model is supported in vLLM nightly. To serve the model:

vllm serve Kbenkhaled/Qwen3.5-35B-A3B-NVFP4 \
    --reasoning-parser qwen3 \
    --enable-prefix-caching

Evaluation

Evaluated with lm-evaluation-harness, 0-shot, thinking mode ON.

Benchmark	Qwen3.5-35B-A3B	Qwen3.5-35B-A3B-NVFP4 (this model)	Recovery
GPQA Diamond	81.31%	80.81%	99.4%
IFEval	95.56%	92.93%	97.2%
MMLU-Redux	92.51%	92.31%	99.8%
Average	89.79%	88.68%	98.8%

Downloads last month: 2,152

Model tree for Kbenkhaled/Qwen3.5-35B-A3B-NVFP4

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(60)

this model