This is Qwen/Qwen3.5-9B quantized with AutoRound to W4A16 (INT). The model is compatible with vLLM (tested: v0.16.1rc1). Tested with an H200. Currently under evaluation.

Instructions

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
uv pip install git+https://github.com/huggingface/transformers.git
vllm serve [this model ID]  --max-model-len 262144 --reasoning-parser qwen3

Acknowledgments

Thank you Verda for providing the needed compute. I used their H200s. Verda is a European, AI-focused cloud and GPU infrastructure provider with sovereignty, sustainability, data privacy, and performance at its core. Check them out if interested.

Downloads last month
1,471
Safetensors
Model size
4B params
Tensor type
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kaitchup/Qwen3.5-9B-autoround-W4A16

Finetuned
Qwen/Qwen3.5-9B
Quantized
(71)
this model

Collection including kaitchup/Qwen3.5-9B-autoround-W4A16