25% faster generation with Flash SDPA Code

#139

by ykarout - opened 23 days ago

23 days ago

•

Tested on Blackwell (RTX 5080), 25% faster than native SDPA:
───────────────────────┬────────────┬──────────┐
│ Backend │ Total time │ Per step │
├───────────────────────┼────────────┼──────────┤
│ Native SDPA (default) │ 208.49s │ ~4.17s │
├───────────────────────┼────────────┼──────────┤
│ Flash SDPA │ 156.67s │ ~3.13s │
└───────────────────────┴────────────┴──────────┘
Flash SDPA is ~25% faster — saved about 52 seconds on a 50-step Full HD generation.

Use the code:

import torch
from diffusers import ZImagePipeline
from diffusers.models.attention_dispatch import attention_backend

# Load the pipeline
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)
pipe.enable_model_cpu_offload()

# Generate image
prompt = "Two young Asian women stand close together against a backdrop of a plain gray textured wall, possibly an indoor carpeted floor. The woman on the left has long, curly hair, wears a navy blue sweater with cream-colored ruffles on the left sleeve, a white stand-up collar shirt underneath, and white trousers; she wears small gold earrings"
negative_prompt = "" # Optional, but would be powerful when you want to remove some unwanted content

with attention_backend("_native_flash"):
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        height=1920,
        width=1088,
        cfg_normalization=False,
        num_inference_steps=50,
        guidance_scale=4,
        generator=torch.Generator("cuda").manual_seed(42),
    ).images[0]

image.save("example.png")

nostalphia

22 days ago

•

edited 16 days ago

4080s + flash attention 2.83, ComfyUI, a 1k*2k image takes 130 seconds. TAT

edited: I just realized this is a discussion about the turbo model; I thought it was about z-image-base :P
How do you guys tolerate this speed... I went downstairs to buy a pack of cigarettes and encountered nine people who had XXXXed "that man", and when I came back, the progress bar wasn't even at the bottom.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment