File size: 1,442 Bytes
2366496 3f56f53 2366496 66a07f9 d25601f 16ec7a5 66a07f9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ---
base_model:
- Qwen/Qwen3.5-397B-A17B
tags:
- qwen
- fp8
- vllm
- compressed-tensors
name: RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic
---
# FP8 Quantized Qwen3.5-397B-A17B
This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
It is compatible and tested against vllm main. Deploy it with: `vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic`.
# Preliminary Evaluations
1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy:
| | Qwen/Qwen3.5-397B-A17B | RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic<br> (this model) |
| -------- | :--------------------: | :------------------------------------: |
| Accuracy | 89.5 | 89.4 |
| Recovery | \- | 99.9% |
2) Under greedy sampling, the model generates almost identical text to the unquantized baseline. `Qwen/Qwen3.5-397B-A17B` is left, `RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamic` is right:

**Note**: More rigorous evaluations are currently in progress and will be available soon. |