This is Qwen/Qwen3.5-27B quantized with AutoRound to NVFP4 with linear-attention layers kept in 16-bit. The model is compatible with vLLM (tested: v0.16.1rc1). Tested with an RTX Pro 6000. Currently under evaluation.

Instructions

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
uv pip install git+https://github.com/huggingface/transformers.git
vllm serve [this model ID]  --max-model-len 262144 --reasoning-parser qwen3

Acknowledgments

Thank you Verda for providing the needed compute. I used their B200s for this. Verda is a European, AI-focused cloud and GPU infrastructure provider with sovereignty, sustainability, data privacy, and performance at its core. Check them out if interested.

Downloads last month
1,133
Safetensors
Model size
9B params
Tensor type
BF16
·
F32
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16

Base model

Qwen/Qwen3.5-27B
Quantized
(112)
this model

Collection including kaitchup/Qwen3.5-27B-autoround-NVFP4-linearattn-BF16