Ming-Lite-Omni — Rich Layout LoRA

A LoRA adapter fine-tuned on top of inclusionAI/Ming-Lite-Omni to teach the model layout design generation — given a natural-language user intent and a reference image, the model outputs a complete layout_config JSON describing a poster/graphic design layout.

What it does

Input: a user intent (e.g. "Create a promotional poster for a flash sale with bold colors and a clear call to action")

Output: a rich, structured layout_config JSON with:

Canvas dimensions and background style
Component tree of GROUP, TEXT, and IMAGE nodes
Pixel-precise bounding boxes (left, top, width, height) for every element
Full typography specs per text node (fontFamily, fontSize, fontWeight, fontStyle, letterSpacing, lineHeight, color, textAlign, etc.)
CSS transforms (translate, rotate, scale) per component
Image slot descriptors (src, alt, data0_element_type)
Opacity and overflow settings

This adapter is intended to be paired with the model's image generation capability to compose final poster images end-to-end.

Model Details

Developed by: DianneDaian
Base model: inclusionAI/Ming-Lite-Omni
Model type: Multimodal causal LM with LoRA adapter (PEFT)
Fine-tune task: Layout config JSON generation from user intent + reference image
Language: English
License: Inherited from base model

How to Use

from transformers import AutoProcessor
from peft import PeftModel
import torch

base_model_path = "inclusionAI/Ming-Lite-Omni"
adapter_path = "DianneDaian/ming-lica-layout-lora"

# Load base model (see Ming-Lite-Omni docs for full loading code)
# model = load_ming_model(base_model_path)
# model = PeftModel.from_pretrained(model, adapter_path)

# Prompt format
prompt = """You are a layout design assistant. Given a user intent and a reference image of a layout,
output the full layout_config JSON that captures the design.
Output ONLY valid JSON starting with {"layout_config": ...}

User intent: <your intent here>"""

See test_intent_to_poster_pipeline.py in the project repo for the full end-to-end inference pipeline (intent → layout JSON → component image generation → final poster composition).

Training Details

Training Data

Source: LICA layout dataset — 5,855 real graphic design layouts
Format per sample: user intent (text) + reference layout image + full layout_config JSON (target)
Layout_config fields: components, style, typography, transforms, image slots
Dataset file: lica_layouts_rich_50k_train.jsonl

Training Hyperparameters

Parameter	Value
Epochs	3
Total steps	2,196
Per-device batch size	1
Gradient accumulation	8 (effective batch = 8)
Learning rate	1e-4 (cosine decay)
Max sequence length	4096 tokens
Precision	bfloat16 + Flash Attention 2
Loss function	Causal LM cross-entropy (response tokens only)
Optimizer	AdamW

LoRA Configuration

Parameter	Value
Rank (`r`)	16
Alpha	32
Dropout	0.05
Target modules	`dense`, `query_key_value`, `down_proj`, `up_proj`, `gate_proj`
Modules to save	`lm_head`
Trainable parameters	907,935,744 (3.74% of 24.27B total)