Z-Image SDNQ uint4 + SVD-r32

This is a 4-bit quantized version of Tongyi-MAI/Z-Image using SDNQ (Structured Decomposable Neural Quantization) with SVD decomposition.

Model Details

Base Model: Tongyi-MAI/Z-Image
Quantization Method: SDNQ uint4 + SVD
SVD Rank: 32
Model Size: ~2GB (reduced from ~8GB)
Precision: 4-bit weights with int8 matmul
Performance: Minimal quality loss compared to full precision

Quantization Specifications

weights_dtype = "uint4"          # 4-bit weights
use_svd = True                   # SVD decomposition enabled
svd_rank = 32                    # SVD rank for quality preservation
quantized_matmul_dtype = "int8"  # Compute precision
group_size = 0                   # Auto group size

Usage

Basic Text-to-Image Generation

import torch
from diffusers import DiffusionPipeline
from sdnq.loader import apply_sdnq_options_to_model

# Load the quantized model
pipe = DiffusionPipeline.from_pretrained(
    "YOUR_USERNAME/Z-Image-SDNQ-uint4-svd-r32",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Apply SDNQ configuration
pipe.transformer = apply_sdnq_options_to_model(
    pipe.transformer,
    use_quantized_matmul=False  # Set to False for Windows/No-Triton
)
pipe.text_encoder = apply_sdnq_options_to_model(
    pipe.text_encoder,
    use_quantized_matmul=False
)

pipe = pipe.to("cuda")

# Generate image
image = pipe(
    prompt="a beautiful landscape with mountains and a lake at sunset",
    num_inference_steps=30,
    guidance_scale=1.0
).images[0]

image.save("output.png")

9:16 Portrait Generation (Maximum Resolution)

import torch
from diffusers import DiffusionPipeline
from sdnq.loader import apply_sdnq_options_to_model

# Configuration
WIDTH = 768
HEIGHT = 1344  # 9:16 aspect ratio
PROMPT = "a beautiful landscape with mountains and a lake at sunset, highly detailed, 8k, masterpiece"
STEPS = 30
GUIDANCE = 1.0

# Load model
pipe = DiffusionPipeline.from_pretrained(
    "YOUR_USERNAME/Z-Image-SDNQ-uint4-svd-r32",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Configure SDNQ
pipe.transformer = apply_sdnq_options_to_model(
    pipe.transformer,
    use_quantized_matmul=False
)
pipe.text_encoder = apply_sdnq_options_to_model(
    pipe.text_encoder,
    use_quantized_matmul=False
)

pipe = pipe.to("cuda")

# Generate
image = pipe(
    prompt=PROMPT,
    num_inference_steps=STEPS,
    guidance_scale=GUIDANCE,
    width=WIDTH,
    height=HEIGHT
).images[0]

image.save("portrait_9x16.png")

Recommended Settings

Aspect Ratio	Resolution	Steps	Guidance Scale
1:1 (Square)	768x768	30	1.0
16:9 (Landscape)	1344x768	30	1.0
9:16 (Portrait)	768x1344	30	1.0
4:3	1024x768	30	1.0

Note: For maximum quality, use 30 steps. For faster generation (lower quality), you can use 4-6 steps.

Requirements

pip install torch diffusers transformers sdnq

System Requirements

GPU: NVIDIA GPU with 8GB+ VRAM recommended
CUDA: 11.8 or higher
RAM: 16GB+ system RAM
Disk: ~2GB for model storage

Performance

VRAM Usage: ~4-6GB (depending on resolution)
Generation Speed: ~5-10 seconds per image (30 steps, 768x1344)
Quality: Near-identical to full precision model

Quantization Method

This model uses SDNQ (Structured Decomposable Neural Quantization) which:

Reduces model size by ~75% (8GB → 2GB)
Maintains high image quality through SVD decomposition
Enables faster inference on consumer GPUs
Supports both int-based and float-based quantization schemes

How It Was Quantized

from diffusers import DiffusionPipeline
from sdnq.loader import sdnq_post_load_quant, save_sdnq_model

# Load base model
pipe = DiffusionPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Apply quantization to transformer
quantized_transformer = sdnq_post_load_quant(
    pipe.transformer,
    weights_dtype="uint4",
    use_svd=True,
    svd_rank=32,
    quantized_matmul_dtype="int8",
    group_size=0
)

pipe.transformer = quantized_transformer

# Save
save_sdnq_model(pipe, "./Z-Image-SDNQ-uint4-svd-r32", is_pipeline=True)

Limitations

Requires SDNQ library (pip install sdnq)
Best results with NVIDIA GPUs (CPU inference not recommended)
Some quality trade-off compared to full precision (minimal in most cases)

Citation

If you use this model, please cite the original Z-Image paper:

@article{zimage2024,
  title={Z-Image: Efficient Text-to-Image Synthesis},
  author={Tongyi-MAI Team},
  year={2024}
}

License

This model inherits the Apache 2.0 license from the base Tongyi-MAI/Z-Image model.

Acknowledgments

Base Model: Tongyi-MAI/Z-Image
Quantization: SDNQ library
Community support and testing

Additional Resources

Downloads last month: -

Model tree for Abrahamm3r/Z-Image-SDNQ-uint4-svd-r32

Base model

Tongyi-MAI/Z-Image

Finetuned

(25)

this model

Abrahamm3r
/

Z-Image-SDNQ-uint4-svd-r32