AX-M1 (AX8850) — Realistic Vision (SD 1.5) Euler 512 (AXMODEL weights)

This repository hosts the compiled .axmodel weights for running Realistic Vision (Stable Diffusion 1.5–based) with Euler / EulerDiscreteScheduler at 512×512 on Radxa AI Core AX-M1 (AX8850).

Runtime / scripts (GitHub): https://github.com/Mojo24x7/SD1.5_AXM1-AX8850_Euler

These files are compiled AXERA artifacts (.axmodel) intended for AX-M1 / AX8850 inference via AXCLRT/axengine. They are not raw PyTorch weights.

What’s inside

Main weights:

sd15_text_encoder_sim.axmodel — CLIP text encoder (prompt → text embeddings)
unet.axmodel — UNet denoiser (latent diffusion core)
vae_decoder.axmodel — VAE decoder (latent → RGB image)

Optional (needed for img2img / masked workflows):

vae_encoder.axmodel — VAE encoder (RGB → latent)

Download

Option A — Git LFS (recommended)

git lfs install
git clone https://huggingface.co/Mojo24x7/sd15-axm1-euler512-axmodels

Option B — Hugging Face CLI

pip install -U "huggingface_hub[cli]"
huggingface-cli download Mojo24x7/sd15-axm1-euler512-axmodels \
  --local-dir sd15-axm1-euler512-axmodels

Where to place the files

In the runtime repo, place these into:

./axmodels/
  sd15_text_encoder_sim.axmodel
  unet.axmodel
  vae_decoder.axmodel
  vae_encoder.axmodel   (optional)

The runtime/scripts repo also expects supporting assets (tokenizer, scheduler config, VAE config). See the GitHub repo for the full folder layout.

Expected model I/O (Euler 512)

Text encoder

input: input_ids [1,77] int32
output: last_hidden_state [1,77,768] fp32

UNet

inputs:
- sample [1,4,64,64] fp32 (512/8 = 64 latent resolution)
- timestep [1] int32
- encoder_hidden_states [1,77,768] fp32
output: [1,4,64,64] fp32

VAE decoder

input: latent [1,4,64,64] fp32
output: [1,3,512,512] fp32 (commonly in [-1..1] before postprocess)

VAE encoder (optional)

input: image [1,3,512,512] fp32
output: latent [1,4,64,64] fp32

Runtime notes (important)

The runtime scripts use EulerDiscreteScheduler.
Ensure input_ids and timestep are int32 (int64 will fail in many AX pipelines).
Typical flow:
1. tokenize → text encoder
2. scheduler loop → UNet
3. VAE decode → image postprocess

Base model: Realistic Vision

These compiled weights are derived from Realistic Vision, which is Stable Diffusion 1.5–based.

Troubleshooting

If cloning is slow or files look tiny: you likely don’t have LFS installed.
- Run git lfs install and re-clone.
If the runtime says a model input type is wrong:
- Verify timestep is int32
- Verify input_ids is int32
If outputs look washed out:
- Check VAE postprocess and scaling (model outputs typically need (x * 0.5 + 0.5) then clamp to [0..1]).

Credits

AX-M1 / AX8850 compilation and runtime packaging: Mojo24x7
Base architecture: Stable Diffusion 1.5 family (Realistic Vision derivative)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support