Text-to-Image
Diffusers
Safetensors
English
ZImagePipeline

25% faster generation with Flash SDPA Code

#139
by ykarout - opened

Tested on Blackwell (RTX 5080), 25% faster than native SDPA:
───────────────────────┬────────────┬──────────┐
│ Backend │ Total time │ Per step │
├───────────────────────┼────────────┼──────────┤
│ Native SDPA (default) │ 208.49s │ ~4.17s │
├───────────────────────┼────────────┼──────────┤
│ Flash SDPA │ 156.67s │ ~3.13s │
└───────────────────────┴────────────┴──────────┘
Flash SDPA is ~25% faster — saved about 52 seconds on a 50-step Full HD generation.

Use the code:

import torch
from diffusers import ZImagePipeline
from diffusers.models.attention_dispatch import attention_backend

# Load the pipeline
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)
pipe.enable_model_cpu_offload()

# Generate image
prompt = "Two young Asian women stand close together against a backdrop of a plain gray textured wall, possibly an indoor carpeted floor. The woman on the left has long, curly hair, wears a navy blue sweater with cream-colored ruffles on the left sleeve, a white stand-up collar shirt underneath, and white trousers; she wears small gold earrings"
negative_prompt = "" # Optional, but would be powerful when you want to remove some unwanted content

with attention_backend("_native_flash"):
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        height=1920,
        width=1088,
        cfg_normalization=False,
        num_inference_steps=50,
        guidance_scale=4,
        generator=torch.Generator("cuda").manual_seed(42),
    ).images[0]

image.save("example.png")

4080s + flash attention 2.83, ComfyUI, a 1k*2k image takes 130 seconds. TAT

edited: I just realized this is a discussion about the turbo model; I thought it was about z-image-base :P
How do you guys tolerate this speed... I went downstairs to buy a pack of cigarettes and encountered nine people who had XXXXed "that man", and when I came back, the progress bar wasn't even at the bottom.

Sign up or log in to comment