Ming-Lite-Omni β Rich Layout LoRA
A LoRA adapter fine-tuned on top of inclusionAI/Ming-Lite-Omni to teach the model layout design generation β given a natural-language user intent and a reference image, the model outputs a complete layout_config JSON describing a poster/graphic design layout.
What it does
Input: a user intent (e.g. "Create a promotional poster for a flash sale with bold colors and a clear call to action")
Output: a rich, structured layout_config JSON with:
- Canvas dimensions and background style
- Component tree of
GROUP,TEXT, andIMAGEnodes - Pixel-precise bounding boxes (
left,top,width,height) for every element - Full typography specs per text node (
fontFamily,fontSize,fontWeight,fontStyle,letterSpacing,lineHeight,color,textAlign, etc.) - CSS transforms (translate, rotate, scale) per component
- Image slot descriptors (
src,alt,data0_element_type) - Opacity and overflow settings
This adapter is intended to be paired with the model's image generation capability to compose final poster images end-to-end.
Model Details
- Developed by: DianneDaian
- Base model: inclusionAI/Ming-Lite-Omni
- Model type: Multimodal causal LM with LoRA adapter (PEFT)
- Fine-tune task: Layout config JSON generation from user intent + reference image
- Language: English
- License: Inherited from base model
How to Use
from transformers import AutoProcessor
from peft import PeftModel
import torch
base_model_path = "inclusionAI/Ming-Lite-Omni"
adapter_path = "DianneDaian/ming-lica-layout-lora"
# Load base model (see Ming-Lite-Omni docs for full loading code)
# model = load_ming_model(base_model_path)
# model = PeftModel.from_pretrained(model, adapter_path)
# Prompt format
prompt = """You are a layout design assistant. Given a user intent and a reference image of a layout,
output the full layout_config JSON that captures the design.
Output ONLY valid JSON starting with {"layout_config": ...}
User intent: <your intent here>"""
See test_intent_to_poster_pipeline.py in the project repo for the full end-to-end inference pipeline (intent β layout JSON β component image generation β final poster composition).
Training Details
Training Data
- Source: LICA layout dataset β 5,855 real graphic design layouts
- Format per sample: user intent (text) + reference layout image + full
layout_configJSON (target) - Layout_config fields: components, style, typography, transforms, image slots
- Dataset file:
lica_layouts_rich_50k_train.jsonl
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Total steps | 2,196 |
| Per-device batch size | 1 |
| Gradient accumulation | 8 (effective batch = 8) |
| Learning rate | 1e-4 (cosine decay) |
| Max sequence length | 4096 tokens |
| Precision | bfloat16 + Flash Attention 2 |
| Loss function | Causal LM cross-entropy (response tokens only) |
| Optimizer | AdamW |
LoRA Configuration
| Parameter | Value |
|---|---|
Rank (r) |
16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | dense, query_key_value, down_proj, up_proj, gate_proj |
| Modules to save | lm_head |
| Trainable parameters | 907,935,744 (3.74% of 24.27B total) |
Training Loss
| Step | Loss |
|---|---|
| 1 | 0.857 |
| 99 | 0.654 |
| 199 | 0.502 |
| 499 | 0.525 |
| 799 | 0.433 |
| 1099 | 0.514 |
| 1499 | 0.535 |
| 1799 | 0.525 |
| 2099 | 0.461 |
| 2196 | 0.438 |
Final train loss: 0.5405 β trained for ~50h 40min on a single GPU.
Compute
- Hardware: 1Γ GPU (A100-class)
- Training time: ~50 hours 40 minutes (182,401 seconds)
- Total FLOPs: 9.06 Γ 10ΒΉβΈ
- Adapter size: 3.0 GB
Sample Outputs
See batch_5_runs_rich.md for 5 end-to-end runs with generated poster images and layout JSONs.
Project Repository
github.com/diannedaian/random-task-0207
Framework Versions
- PEFT 0.18.1
- Transformers 4.x
- PyTorch (bfloat16)
- Downloads last month
- 1