Z-Image SDNQ uint4 + SVD-r32

This is a 4-bit quantized version of Tongyi-MAI/Z-Image using SDNQ (Structured Decomposable Neural Quantization) with SVD decomposition.

Model Details

  • Base Model: Tongyi-MAI/Z-Image
  • Quantization Method: SDNQ uint4 + SVD
  • SVD Rank: 32
  • Model Size: ~2GB (reduced from ~8GB)
  • Precision: 4-bit weights with int8 matmul
  • Performance: Minimal quality loss compared to full precision

Quantization Specifications

weights_dtype = "uint4"          # 4-bit weights
use_svd = True                   # SVD decomposition enabled
svd_rank = 32                    # SVD rank for quality preservation
quantized_matmul_dtype = "int8"  # Compute precision
group_size = 0                   # Auto group size

Usage

Basic Text-to-Image Generation

import torch
from diffusers import DiffusionPipeline
from sdnq.loader import apply_sdnq_options_to_model

# Load the quantized model
pipe = DiffusionPipeline.from_pretrained(
    "YOUR_USERNAME/Z-Image-SDNQ-uint4-svd-r32",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Apply SDNQ configuration
pipe.transformer = apply_sdnq_options_to_model(
    pipe.transformer,
    use_quantized_matmul=False  # Set to False for Windows/No-Triton
)
pipe.text_encoder = apply_sdnq_options_to_model(
    pipe.text_encoder,
    use_quantized_matmul=False
)

pipe = pipe.to("cuda")

# Generate image
image = pipe(
    prompt="a beautiful landscape with mountains and a lake at sunset",
    num_inference_steps=30,
    guidance_scale=1.0
).images[0]

image.save("output.png")

9:16 Portrait Generation (Maximum Resolution)

import torch
from diffusers import DiffusionPipeline
from sdnq.loader import apply_sdnq_options_to_model

# Configuration
WIDTH = 768
HEIGHT = 1344  # 9:16 aspect ratio
PROMPT = "a beautiful landscape with mountains and a lake at sunset, highly detailed, 8k, masterpiece"
STEPS = 30
GUIDANCE = 1.0

# Load model
pipe = DiffusionPipeline.from_pretrained(
    "YOUR_USERNAME/Z-Image-SDNQ-uint4-svd-r32",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Configure SDNQ
pipe.transformer = apply_sdnq_options_to_model(
    pipe.transformer,
    use_quantized_matmul=False
)
pipe.text_encoder = apply_sdnq_options_to_model(
    pipe.text_encoder,
    use_quantized_matmul=False
)

pipe = pipe.to("cuda")

# Generate
image = pipe(
    prompt=PROMPT,
    num_inference_steps=STEPS,
    guidance_scale=GUIDANCE,
    width=WIDTH,
    height=HEIGHT
).images[0]

image.save("portrait_9x16.png")

Recommended Settings

Aspect Ratio Resolution Steps Guidance Scale
1:1 (Square) 768x768 30 1.0
16:9 (Landscape) 1344x768 30 1.0
9:16 (Portrait) 768x1344 30 1.0
4:3 1024x768 30 1.0

Note: For maximum quality, use 30 steps. For faster generation (lower quality), you can use 4-6 steps.

Requirements

pip install torch diffusers transformers sdnq

System Requirements

  • GPU: NVIDIA GPU with 8GB+ VRAM recommended
  • CUDA: 11.8 or higher
  • RAM: 16GB+ system RAM
  • Disk: ~2GB for model storage

Performance

  • VRAM Usage: ~4-6GB (depending on resolution)
  • Generation Speed: ~5-10 seconds per image (30 steps, 768x1344)
  • Quality: Near-identical to full precision model

Quantization Method

This model uses SDNQ (Structured Decomposable Neural Quantization) which:

  • Reduces model size by ~75% (8GB โ†’ 2GB)
  • Maintains high image quality through SVD decomposition
  • Enables faster inference on consumer GPUs
  • Supports both int-based and float-based quantization schemes

How It Was Quantized

from diffusers import DiffusionPipeline
from sdnq.loader import sdnq_post_load_quant, save_sdnq_model

# Load base model
pipe = DiffusionPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Apply quantization to transformer
quantized_transformer = sdnq_post_load_quant(
    pipe.transformer,
    weights_dtype="uint4",
    use_svd=True,
    svd_rank=32,
    quantized_matmul_dtype="int8",
    group_size=0
)

pipe.transformer = quantized_transformer

# Save
save_sdnq_model(pipe, "./Z-Image-SDNQ-uint4-svd-r32", is_pipeline=True)

Limitations

  • Requires SDNQ library (pip install sdnq)
  • Best results with NVIDIA GPUs (CPU inference not recommended)
  • Some quality trade-off compared to full precision (minimal in most cases)

Citation

If you use this model, please cite the original Z-Image paper:

@article{zimage2024,
  title={Z-Image: Efficient Text-to-Image Synthesis},
  author={Tongyi-MAI Team},
  year={2024}
}

License

This model inherits the Apache 2.0 license from the base Tongyi-MAI/Z-Image model.

Acknowledgments

Additional Resources

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 2 Ask for provider support

Model tree for Abrahamm3r/Z-Image-SDNQ-uint4-svd-r32

Base model

Tongyi-MAI/Z-Image
Finetuned
(25)
this model

Space using Abrahamm3r/Z-Image-SDNQ-uint4-svd-r32 1