AX-M1 (AX8850) — Realistic Vision (SD 1.5) Euler 512 (AXMODEL weights)

This repository hosts the compiled .axmodel weights for running Realistic Vision (Stable Diffusion 1.5–based) with Euler / EulerDiscreteScheduler at 512×512 on Radxa AI Core AX-M1 (AX8850).

Runtime / scripts (GitHub): https://github.com/Mojo24x7/SD1.5_AXM1-AX8850_Euler

These files are compiled AXERA artifacts (.axmodel) intended for AX-M1 / AX8850 inference via AXCLRT/axengine. They are not raw PyTorch weights.


What’s inside

Main weights:

  • sd15_text_encoder_sim.axmodel — CLIP text encoder (prompt → text embeddings)
  • unet.axmodel — UNet denoiser (latent diffusion core)
  • vae_decoder.axmodel — VAE decoder (latent → RGB image)

Optional (needed for img2img / masked workflows):

  • vae_encoder.axmodel — VAE encoder (RGB → latent)

Download

Option A — Git LFS (recommended)

git lfs install
git clone https://huggingface.co/Mojo24x7/sd15-axm1-euler512-axmodels

Option B — Hugging Face CLI

pip install -U "huggingface_hub[cli]"
huggingface-cli download Mojo24x7/sd15-axm1-euler512-axmodels \
  --local-dir sd15-axm1-euler512-axmodels

Where to place the files

In the runtime repo, place these into:

./axmodels/
  sd15_text_encoder_sim.axmodel
  unet.axmodel
  vae_decoder.axmodel
  vae_encoder.axmodel   (optional)

The runtime/scripts repo also expects supporting assets (tokenizer, scheduler config, VAE config). See the GitHub repo for the full folder layout.


Expected model I/O (Euler 512)

Text encoder

  • input: input_ids [1,77] int32
  • output: last_hidden_state [1,77,768] fp32

UNet

  • inputs:
    • sample [1,4,64,64] fp32 (512/8 = 64 latent resolution)
    • timestep [1] int32
    • encoder_hidden_states [1,77,768] fp32
  • output: [1,4,64,64] fp32

VAE decoder

  • input: latent [1,4,64,64] fp32
  • output: [1,3,512,512] fp32 (commonly in [-1..1] before postprocess)

VAE encoder (optional)

  • input: image [1,3,512,512] fp32
  • output: latent [1,4,64,64] fp32

Runtime notes (important)

  • The runtime scripts use EulerDiscreteScheduler.
  • Ensure input_ids and timestep are int32 (int64 will fail in many AX pipelines).
  • Typical flow:
    1. tokenize → text encoder
    2. scheduler loop → UNet
    3. VAE decode → image postprocess

Base model: Realistic Vision

These compiled weights are derived from Realistic Vision, which is Stable Diffusion 1.5–based.


Troubleshooting

  • If cloning is slow or files look tiny: you likely don’t have LFS installed.
    • Run git lfs install and re-clone.
  • If the runtime says a model input type is wrong:
    • Verify timestep is int32
    • Verify input_ids is int32
  • If outputs look washed out:
    • Check VAE postprocess and scaling (model outputs typically need (x * 0.5 + 0.5) then clamp to [0..1]).

Credits

  • AX-M1 / AX8850 compilation and runtime packaging: Mojo24x7
  • Base architecture: Stable Diffusion 1.5 family (Realistic Vision derivative)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support