Improve model card for Kandinsky 5.0 T2I Lite with metadata, links, and usage
Browse filesThis PR enhances the model card for `Kandinsky 5.0 T2I Lite` by:
- Adding the `pipeline_tag: text-to-image` for better discoverability.
- Specifying `library_name: diffusers` as the model is compatible with the Hugging Face Diffusers library, based on explicit mentions and usage examples in the upstream GitHub repository.
- Including a direct link to the paper [Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation](https://huggingface.co/papers/2511.14993).
- Adding links to the official project page ([https://kandinskylab.ai/](https://kandinskylab.ai/)) and the GitHub repository ([https://github.com/kandinskylab/kandinsky-5](https://github.com/kandinskylab/kandinsky-5)).
- Providing a practical Python code snippet for Text-to-Image inference, directly extracted from the official GitHub README's "T2I Inference" section.
- Incorporating a concise description of the model's capabilities and relevant image examples.
- Adding the BibTeX citation for the primary paper for proper academic attribution.
These changes aim to make the model card more informative, discoverable, and user-friendly.
|
@@ -1,3 +1,109 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: text-to-image
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
<div align="center">
|
| 8 |
+
<picture>
|
| 9 |
+
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/kandinskylab/kandinsky-5/raw/main/assets/KANDINSKY_LOGO_1_WHITE.png">
|
| 10 |
+
<source media="(prefers-color-scheme: light)" srcset="https://github.com/kandinskylab/kandinsky-5/raw/main/assets/KANDINSKY_LOGO_1_BLACK.png">
|
| 11 |
+
<img alt="Kandinsky Logo" src="https://user-images.githubusercontent.com/25423296/163456779-a8556205-d0a5-45e2-ac17-42d089e3c3f8.png">
|
| 12 |
+
</picture>
|
| 13 |
+
</div>
|
| 14 |
+
|
| 15 |
+
# Kandinsky 5.0 T2I Lite
|
| 16 |
+
|
| 17 |
+
This repository contains the Kandinsky 5.0 Text-to-Image Lite model, part of the Kandinsky 5.0 family of foundation models for image and video generation, introduced in the paper [Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation](https://huggingface.co/papers/2511.14993).
|
| 18 |
+
|
| 19 |
+
* **Project Page**: https://kandinskylab.ai/
|
| 20 |
+
* **GitHub Repository**: https://github.com/kandinskylab/kandinsky-5
|
| 21 |
+
* **Diffusers documentation**: https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5
|
| 22 |
+
|
| 23 |
+
## Model Capabilities
|
| 24 |
+
|
| 25 |
+
Kandinsky 5.0 Image Lite is a line-up of 6B image generation models with the following capabilities:
|
| 26 |
+
|
| 27 |
+
* 1K resolution (1280x768, 1024x1024 and others).
|
| 28 |
+
* High visual quality
|
| 29 |
+
* Strong text-writing
|
| 30 |
+
* Russian concepts understanding
|
| 31 |
+
|
| 32 |
+
## Examples
|
| 33 |
+
|
| 34 |
+
<table border="0" style="width: 200; text-align: left; margin-top: 20px;">
|
| 35 |
+
<tr>
|
| 36 |
+
<td>
|
| 37 |
+
<image src="https://github.com/user-attachments/assets/f46e6866-15ce-445d-bb81-9843a341e2a9" width=200 ></image>
|
| 38 |
+
</td>
|
| 39 |
+
<td>
|
| 40 |
+
<image src="https://github.com/user-attachments/assets/74f3af1f-b11e-4174-9f36-e956b871a6e6" width=200 ></image>
|
| 41 |
+
</td>
|
| 42 |
+
<td>
|
| 43 |
+
<image src="https://github.com/user-attachments/assets/7e469d09-8b96-4691-b929-dd809827adf9" width=200 ></image>
|
| 44 |
+
</td>
|
| 45 |
+
<tr>
|
| 46 |
+
</table>
|
| 47 |
+
<table border="0" style="width: 200; text-align: left; margin-top: 10px;">
|
| 48 |
+
<td>
|
| 49 |
+
<image src="https://github.com/user-attachments/assets/8054b25b-5d71-4547-8822-b07d71d137f4" width=200 ></image>
|
| 50 |
+
</td>
|
| 51 |
+
<td>
|
| 52 |
+
<image src="https://github.com/user-attachments/assets/f4825237-640b-4b2d-86e6-fd08fe95039f" width=200 ></image>
|
| 53 |
+
</td>
|
| 54 |
+
<td>
|
| 55 |
+
<image src="https://github.com/user-attachments/assets/73fbbc2a-3249-4b70-8931-2893ab0107a5" width=200 ></image>
|
| 56 |
+
</td>
|
| 57 |
+
|
| 58 |
+
</table>
|
| 59 |
+
<table border="0" style="width: 200; text-align: left; margin-top: 10px;">
|
| 60 |
+
<td>
|
| 61 |
+
<image src="https://github.com/user-attachments/assets/c309650b-8d8b-4e44-bb63-48287e22ff44" width=200 ></image>
|
| 62 |
+
</td>
|
| 63 |
+
<td>
|
| 64 |
+
<image src="https://github.com/user-attachments/assets/d5c0fcca-69b7-4d77-9c36-cd2fb87f2615" width=200 ></image>
|
| 65 |
+
</td>
|
| 66 |
+
<td>
|
| 67 |
+
<image src="https://github.com/user-attachments/assets/7895c3e8-2e72-40b8-8bf7-dcac859a6b29" width=200 ></image>
|
| 68 |
+
</td>
|
| 69 |
+
|
| 70 |
+
</table>
|
| 71 |
+
|
| 72 |
+
## How to use
|
| 73 |
+
|
| 74 |
+
You can use this model for text-to-image generation with the `kandinsky` library, which integrates with Hugging Face `diffusers`.
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
import torch
|
| 78 |
+
from kandinsky import get_T2I_pipeline
|
| 79 |
+
|
| 80 |
+
device_map = {
|
| 81 |
+
"dit": torch.device('cuda:0'),
|
| 82 |
+
"vae": torch.device('cuda:0'),
|
| 83 |
+
"text_embedder": torch.device('cuda:0')
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
pipe = get_T2I_pipeline(device_map, conf_path="configs/k5_lite_t2i_sft_hd.yaml")
|
| 87 |
+
|
| 88 |
+
images = pipe(
|
| 89 |
+
seed=42,
|
| 90 |
+
save_path='./test.png',
|
| 91 |
+
text="A cat in a red hat with a label 'HELLO'"
|
| 92 |
+
)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
## Citation
|
| 96 |
+
|
| 97 |
+
If you find our work helpful, please cite our paper:
|
| 98 |
+
|
| 99 |
+
```bibtex
|
| 100 |
+
@misc{arkhipkin2025kandinsky50familyfoundation,
|
| 101 |
+
title={Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation},
|
| 102 |
+
author={Vladimir Arkhipkin and Vladimir Korviakov and Nikolai Gerasimenko and Denis Parkhomenko and Viacheslav Vasilev and Alexey Letunovskiy and Nikolai Vaulin and Maria Kovaleva and Ivan Kirillov and Lev Novitskiy and Denis Koposov and Nikita Kiselev and Alexander Varlamov and Dmitrii Mikhailov and Vladimir Polovnikov and Andrey Shutkin and Julia Agafonova and Ilya Vasiliev and Anastasiia Kargapoltseva and Anna Dmitrienko and Anastasia Maltseva and Anna Averchenkova and Olga Kim and Tatiana Nikulina and Denis Dimitrov},
|
| 103 |
+
year={2025},
|
| 104 |
+
eprint={2511.14993},
|
| 105 |
+
archivePrefix={arXiv},
|
| 106 |
+
primaryClass={cs.CV},
|
| 107 |
+
url={https://arxiv.org/abs/2511.14993},
|
| 108 |
+
}
|
| 109 |
+
```
|