File size: 6,189 Bytes
3db6a63 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | ---
pipeline_tag: text-to-image
library_name: diffusers
license: apache-2.0
tags:
- diffusion
- text-to-image
- photoroom
- prx
- open-source
- image-generation
- flow-matching
demo: https://huggingface.co/spaces/Photoroom/PRX-1024-beta-version
model_type: diffusion-transformer
inference: true
---
# PRX: Open Text-to-Image Generative Model

**PRX (Photoroom Experimental)** is a **1.3-billion-parameter text-to-image model trained entirely from scratch** and released under an **Apache 2.0 license**.
It is part of Photoroom’s broader effort to **open-source the complete process** behind training large-scale text-to-image models — covering architecture design, optimization strategies, and post-training alignment. The goal is to make PRX both a **strong open baseline** and a **transparent research reference** for those developing or studying diffusion-transformer models.
For more information, please read our [announcement blog post](https://huggingface.co/blog/Photoroom/prx-open-source-t2i-model).
## Model description
PRX is designed to be **lightweight yet capable**, easy to fine-tune or extend, and fully open.
PRX generates high-quality images from text using a simplified MMDiT architecture where text tokens don’t update through transformer blocks. It uses flow matching with discrete scheduling for efficient sampling and Google’s T5-Gemma-2B-2B-UL2 model for multilingual text encoding. The model has around **1.3B parameters** and delivers fast inference without sacrificing quality. You can choose between **Flux VAE** for balanced quality and speed, or **DC-AE** for higher latent compression and faster processing.
This card in particular describes `Photoroom/prx-512-t2i-dc-ae`, one of the PRX model variants:
- **Resolution:** 512 pixels
- **Architecture:** PRX (MMDiT-like diffusion transformer variant)
- **Latent backbone:** [DC-AE VAE](https://arxiv.org/abs/2410.10733)
- **Text encoder:** T5-Gemma-2B-2B-UL2
- **Training stage:** Base model
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
For other checkpoints, browse the full [PRX collection](https://huggingface.co/collections/Photoroom/prx).
## Example usage
You can use PRX directly in [Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/prx):
```python
from diffusers.pipelines.prx import PRXPipeline
pipe = PRXPipeline.from_pretrained(
"Photoroom/prx-512-t2i-dc-ae",
torch_dtype=torch.bfloat16
).to("cuda")
prompt = "A front-facing portrait of a lion in the golden savanna at sunset"
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("lion.png")
```
## Visual examples and demo
Here are some examples from one of our best checkpoints so far ([Photoroom/prx-1024-t2i-beta](https://huggingface.co/Photoroom/prx-1024-t2i-beta)).
<div style="display:flex; justify-content:center; width:100%;">
<table style="border-collapse:collapse; width:100%; max-width:900px; table-layout:fixed;">
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/ljWO7dK-_CKypruXcApeN.webp" style="width:100%; height:auto;"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/IDHiXpRlUISeJxXtJM6fW.webp" style="width:100%; height:auto;"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/HemYHcMexWnAuYYor5Ztx.webp" style="width:100%; height:auto;"/></td>
</tr>
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/kEUd7dO_30ngn__scTH3M.webp" style="width:100%; height:auto;"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/jGkseXch9HWfB48Z-k5OX.webp" style="width:100%; height:auto;"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/5YnGFBiM1IHrzLh2h7q7t.webp" style="width:100%; height:auto;"/></td>
</tr>
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/OrMntTSvpE8GH1YrBNgZD.webp" style="width:100%; height:auto;"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/Aglz2CljITrEY4V-Q-P60.webp" style="width:100%; height:auto;"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/68d136d7307413e80188d819/h47OBkGOsKVmq51KSRaRu.webp" style="width:100%; height:auto;"/></td>
</tr>
</table>
</div>
[PRX Demo on Hugging Face Spaces](https://huggingface.co/spaces/Photoroom/PRX-1024-beta-version) — interactive text-to-image demo for `Photoroom/prx-1024-t2i-beta`.
## Training details
PRX models were trained from scratch using recent advances in diffusion and flow-matching training. We experimented with a range of modern techniques for efficiency, stability, and alignment, which we’ll cover in more detail in our upcoming series of research posts:
- [Part 0: Overview and release](https://huggingface.co/blog/Photoroom/prx-open-source-t2i-model)
- Part 1: Design experiments and architecture benchmark *(coming soon)*
- Part 2: Accelerating training *(coming soon)*
- Part 3: Post-pretraining *(coming soon)*
## Other PRX models
You can find additional checkpoints in the [PRX collection](https://huggingface.co/collections/Photoroom/prx):
- **Base** – pretrained model before alignment; best starting point for fine-tuning or research
- **SFT** — supervised fine-tuned model; produces more aesthetically pleasing, ready-to-use generations
- **Latent backbones** — Flux's and DC-AE VAEs
- **Distilled** – 8-step generation with LADD
- **Resolutions** – 256, 512, and 1024 pixels
## License
PRX is available under an **Apache 2.0 license**.
## Use restrictions
You must not use PRX models for:
1. any of the restricted uses set forth in the [Gemma Prohibited Use Policy](ai.google.dev/gemma/prohibited_use_policy);
2. or any activity that violates applicable laws or regulations.
|