Qwen/Qwen2.5-VL-7B Nunchaku (Text Encoder)

Language: English | 中文

This is a quantized text encoder (text_encoder) artifact for Qwen image generation / image editing workflows. It is exported in a format that can be loaded directly by the Nunchaku runtime, and is intended to replace the text_encoder inside a diffusers pipeline to reduce VRAM usage and improve inference efficiency.

Quantization quality

hidden_states_last
- rel_l2: 0.2465697987902546
- cosine: 0.969494104385376
prompt_embeds_trimmed
- rel_l2: 0.2465697987902546
- cosine: 0.969494104385376

Notes

svdq-int4-Qwen2.5vl-Nunchaku.safetensors is for image editing models such as QwenImageEditPipeline / QwenImageEditPlusPipeline. The vision tower is not quantized.
svdq-int4-Qwen2.5vl-text-Nunchaku.safetensors is for the QwenImagePipeline text-to-image model. It only quantizes the text encoder.

Install Nunchaku first

Note: As of 2026-04-09, the Nunchaku PR for this functionality has still not been merged into the official main branch. If you want to try it early, you can pull and merge the code from nunchaku-ai/nunchaku#927.
Official installation guide (recommended source of truth): https://nunchaku.tech/docs/nunchaku/installation/installation.html

Recommended: install the official prebuilt wheel

Prerequisite: install PyTorch >= 2.5 first (the exact requirement still depends on the selected wheel)
Install the Nunchaku wheel: choose the wheel that matches your environment from GitHub Releases / Hugging Face / ModelScope (cp311 means Python 3.11):
- https://github.com/nunchaku-ai/nunchaku/releases

# Example: choose the correct wheel URL for your torch/cuda/python version
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl

Tip: this model is INT4 quantized. Follow the official docs and the wheel compatibility matrix when choosing the package for your torch/cuda/python environment.

Usage

1. Text-to-image (`QwenImagePipeline`)

import torch
from diffusers import QwenImagePipeline
from nunchaku import NunchakuQwenEncoderModel

base_model_dir = "/path/to/qwen-image-model"   # Your Qwen base model directory or HF model id
model_path = "/path/to/Qwen2.5vl-Nunchaku/svdq-int4-Qwen2.5vl-text-Nunchaku.safetensors"  # Use the text variant for QwenImagePipeline
device = "cuda"
torch_dtype = torch.bfloat16  # torch.float16 also works
text_encoder = NunchakuQwenEncoderModel.from_pretrained(
    model_path,
    device=device,
    torch_dtype=torch_dtype,
)

pipe = QwenImagePipeline.from_pretrained(
    base_model_dir,
    text_encoder=text_encoder,
    torch_dtype=torch_dtype,
)
pipe.to(device)

2. Image editing (`QwenImageEditPlusPipeline`)

import torch
from diffusers import QwenImageEditPlusPipeline
from diffusers.utils import load_image
from nunchaku import NunchakuQwenEncoderModel

base_model_dir = "/path/to/qwen-image-edit-model"
model_path = "/path/to/Qwen2.5vl-Nunchaku/svdq-int4-Qwen2.5vl-Nunchaku.safetensors"  # Use the non-text variant for editing models
device = "cuda"
torch_dtype = torch.bfloat16

text_encoder = NunchakuQwenEncoderModel.from_pretrained(
    model_path,
    device=device,
    torch_dtype=torch_dtype,
)

pipe = QwenImageEditPlusPipeline.from_pretrained(
    base_model_dir,
    text_encoder=text_encoder,
    torch_dtype=torch_dtype,
).to(device)

image = load_image("https://example.com/your_image.png").convert("RGB")
result = pipe(
    prompt="Turn the cat in the image into one wearing a wizard hat",
    image=image,
).images[0]
result.save("qwen-image-edit-plus.png")

Recommended environment

Python: (\ge 3.11)
PyTorch: (\ge 2.9) (CUDA environment recommended)
transformers: 5.3
diffusers: 0.37
nunchaku: runtime package providing NunchakuQwenEncoderModel (see installation notes above)

Downloads last month: 80

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support