File size: 6,189 Bytes

3db6a63

---
pipeline_tag: text-to-image
library_name: diffusers
license: apache-2.0
tags:
- diffusion
- text-to-image
- photoroom
- prx
- open-source
- image-generation
- flow-matching
demo: https://huggingface.co/spaces/Photoroom/PRX-1024-beta-version
model_type: diffusion-transformer
inference: true
---

# PRX: Open Text-to-Image Generative Model

![PRX](https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/ZiHS6kQv64EArhBcv7_Yk.jpeg)

**PRX (Photoroom Experimental)** is a **1.3-billion-parameter text-to-image model trained entirely from scratch** and released under an **Apache 2.0 license**.

It is part of Photoroom’s broader effort to **open-source the complete process** behind training large-scale text-to-image models — covering architecture design, optimization strategies, and post-training alignment. The goal is to make PRX both a **strong open baseline** and a **transparent research reference** for those developing or studying diffusion-transformer models.

For more information, please read our [announcement blog post](https://huggingface.co/blog/Photoroom/prx-open-source-t2i-model).

## Model description

PRX is designed to be **lightweight yet capable**, easy to fine-tune or extend, and fully open.

PRX generates high-quality images from text using a simplified MMDiT architecture where text tokens don’t update through transformer blocks. It uses flow matching with discrete scheduling for efficient sampling and Google’s T5-Gemma-2B-2B-UL2 model for multilingual text encoding. The model has around **1.3B parameters** and delivers fast inference without sacrificing quality. You can choose between **Flux VAE** for balanced quality and speed, or **DC-AE** for higher latent compression and faster processing.

This card in particular describes `Photoroom/prx-512-t2i-dc-ae`, one of the PRX model variants:

- **Resolution:** 512 pixels
- **Architecture:** PRX (MMDiT-like diffusion transformer variant)  
- **Latent backbone:** [DC-AE VAE](https://arxiv.org/abs/2410.10733)  
- **Text encoder:** T5-Gemma-2B-2B-UL2
- **Training stage:** Base model
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

For other checkpoints, browse the full [PRX collection](https://huggingface.co/collections/Photoroom/prx).

## Example usage

You can use PRX directly in [Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/prx):

```python
from diffusers.pipelines.prx import PRXPipeline

pipe = PRXPipeline.from_pretrained(
    "Photoroom/prx-512-t2i-dc-ae",
    torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset"
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("lion.png")
```

## Visual examples and demo

Here are some examples from one of our best checkpoints so far ([Photoroom/prx-1024-t2i-beta](https://huggingface.co/Photoroom/prx-1024-t2i-beta)).

<div style="display:flex; justify-content:center; width:100%;">
  <table style="border-collapse:collapse; width:100%; max-width:900px; table-layout:fixed;">
    <tr>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/ljWO7dK-_CKypruXcApeN.webp" style="width:100%; height:auto;"/></td>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/IDHiXpRlUISeJxXtJM6fW.webp" style="width:100%; height:auto;"/></td>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/HemYHcMexWnAuYYor5Ztx.webp" style="width:100%; height:auto;"/></td>
    </tr>
    <tr>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/kEUd7dO_30ngn__scTH3M.webp" style="width:100%; height:auto;"/></td>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/jGkseXch9HWfB48Z-k5OX.webp" style="width:100%; height:auto;"/></td>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/5YnGFBiM1IHrzLh2h7q7t.webp" style="width:100%; height:auto;"/></td>
    </tr>
    <tr>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/OrMntTSvpE8GH1YrBNgZD.webp" style="width:100%; height:auto;"/></td>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/Aglz2CljITrEY4V-Q-P60.webp" style="width:100%; height:auto;"/></td>
      <td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/h47OBkGOsKVmq51KSRaRu.webp" style="width:100%; height:auto;"/></td>
    </tr>
  </table>
</div>

[PRX Demo on Hugging Face Spaces](https://huggingface.co/spaces/Photoroom/PRX-1024-beta-version) — interactive text-to-image demo for `Photoroom/prx-1024-t2i-beta`.

## Training details

PRX models were trained from scratch using recent advances in diffusion and flow-matching training. We experimented with a range of modern techniques for efficiency, stability, and alignment, which we’ll cover in more detail in our upcoming series of research posts:

- [Part 0: Overview and release](https://huggingface.co/blog/Photoroom/prx-open-source-t2i-model)
- Part 1: Design experiments and architecture benchmark *(coming soon)*
- Part 2: Accelerating training *(coming soon)*
- Part 3: Post-pretraining *(coming soon)*

## Other PRX models

You can find additional checkpoints in the [PRX collection](https://huggingface.co/collections/Photoroom/prx):
- **Base** – pretrained model before alignment; best starting point for fine-tuning or research
- **SFT** — supervised fine-tuned model; produces more aesthetically pleasing, ready-to-use generations
- **Latent backbones** — Flux's and DC-AE VAEs
- **Distilled** – 8-step generation with LADD
- **Resolutions** – 256, 512, and 1024 pixels

## License

PRX is available under an **Apache 2.0 license**.

## Use restrictions

You must not use PRX models for:

1. any of the restricted uses set forth in the [Gemma Prohibited Use Policy](ai.google.dev/gemma/prohibited_use_policy);
2. or any activity that violates applicable laws or regulations.