AX-M1 (AX8850) — Realistic Vision (SD 1.5) Euler 512 (AXMODEL weights)
This repository hosts the compiled .axmodel weights for running Realistic Vision (Stable Diffusion 1.5–based) with Euler / EulerDiscreteScheduler at 512×512 on Radxa AI Core AX-M1 (AX8850).
Runtime / scripts (GitHub): https://github.com/Mojo24x7/SD1.5_AXM1-AX8850_Euler
These files are compiled AXERA artifacts (
.axmodel) intended for AX-M1 / AX8850 inference via AXCLRT/axengine. They are not raw PyTorch weights.
What’s inside
Main weights:
sd15_text_encoder_sim.axmodel— CLIP text encoder (prompt → text embeddings)unet.axmodel— UNet denoiser (latent diffusion core)vae_decoder.axmodel— VAE decoder (latent → RGB image)
Optional (needed for img2img / masked workflows):
vae_encoder.axmodel— VAE encoder (RGB → latent)
Download
Option A — Git LFS (recommended)
git lfs install
git clone https://huggingface.co/Mojo24x7/sd15-axm1-euler512-axmodels
Option B — Hugging Face CLI
pip install -U "huggingface_hub[cli]"
huggingface-cli download Mojo24x7/sd15-axm1-euler512-axmodels \
--local-dir sd15-axm1-euler512-axmodels
Where to place the files
In the runtime repo, place these into:
./axmodels/
sd15_text_encoder_sim.axmodel
unet.axmodel
vae_decoder.axmodel
vae_encoder.axmodel (optional)
The runtime/scripts repo also expects supporting assets (tokenizer, scheduler config, VAE config). See the GitHub repo for the full folder layout.
Expected model I/O (Euler 512)
Text encoder
- input:
input_ids[1,77]int32 - output:
last_hidden_state[1,77,768]fp32
UNet
- inputs:
sample[1,4,64,64]fp32(512/8 = 64 latent resolution)timestep[1]int32encoder_hidden_states[1,77,768]fp32
- output:
[1,4,64,64]fp32
VAE decoder
- input:
latent[1,4,64,64]fp32 - output:
[1,3,512,512]fp32(commonly in[-1..1]before postprocess)
VAE encoder (optional)
- input: image
[1,3,512,512]fp32 - output: latent
[1,4,64,64]fp32
Runtime notes (important)
- The runtime scripts use EulerDiscreteScheduler.
- Ensure
input_idsandtimestepare int32 (int64 will fail in many AX pipelines). - Typical flow:
- tokenize → text encoder
- scheduler loop → UNet
- VAE decode → image postprocess
Base model: Realistic Vision
These compiled weights are derived from Realistic Vision, which is Stable Diffusion 1.5–based.
Troubleshooting
- If cloning is slow or files look tiny: you likely don’t have LFS installed.
- Run
git lfs installand re-clone.
- Run
- If the runtime says a model input type is wrong:
- Verify
timestepisint32 - Verify
input_idsisint32
- Verify
- If outputs look washed out:
- Check VAE postprocess and scaling (model outputs typically need
(x * 0.5 + 0.5)then clamp to[0..1]).
- Check VAE postprocess and scaling (model outputs typically need
Credits
- AX-M1 / AX8850 compilation and runtime packaging: Mojo24x7
- Base architecture: Stable Diffusion 1.5 family (Realistic Vision derivative)