Ming-Lite-Omni β€” Rich Layout LoRA

A LoRA adapter fine-tuned on top of inclusionAI/Ming-Lite-Omni to teach the model layout design generation β€” given a natural-language user intent and a reference image, the model outputs a complete layout_config JSON describing a poster/graphic design layout.

What it does

Input: a user intent (e.g. "Create a promotional poster for a flash sale with bold colors and a clear call to action")

Output: a rich, structured layout_config JSON with:

  • Canvas dimensions and background style
  • Component tree of GROUP, TEXT, and IMAGE nodes
  • Pixel-precise bounding boxes (left, top, width, height) for every element
  • Full typography specs per text node (fontFamily, fontSize, fontWeight, fontStyle, letterSpacing, lineHeight, color, textAlign, etc.)
  • CSS transforms (translate, rotate, scale) per component
  • Image slot descriptors (src, alt, data0_element_type)
  • Opacity and overflow settings

This adapter is intended to be paired with the model's image generation capability to compose final poster images end-to-end.

Model Details

  • Developed by: DianneDaian
  • Base model: inclusionAI/Ming-Lite-Omni
  • Model type: Multimodal causal LM with LoRA adapter (PEFT)
  • Fine-tune task: Layout config JSON generation from user intent + reference image
  • Language: English
  • License: Inherited from base model

How to Use

from transformers import AutoProcessor
from peft import PeftModel
import torch

base_model_path = "inclusionAI/Ming-Lite-Omni"
adapter_path = "DianneDaian/ming-lica-layout-lora"

# Load base model (see Ming-Lite-Omni docs for full loading code)
# model = load_ming_model(base_model_path)
# model = PeftModel.from_pretrained(model, adapter_path)

# Prompt format
prompt = """You are a layout design assistant. Given a user intent and a reference image of a layout,
output the full layout_config JSON that captures the design.
Output ONLY valid JSON starting with {"layout_config": ...}

User intent: <your intent here>"""

See test_intent_to_poster_pipeline.py in the project repo for the full end-to-end inference pipeline (intent β†’ layout JSON β†’ component image generation β†’ final poster composition).

Training Details

Training Data

  • Source: LICA layout dataset β€” 5,855 real graphic design layouts
  • Format per sample: user intent (text) + reference layout image + full layout_config JSON (target)
  • Layout_config fields: components, style, typography, transforms, image slots
  • Dataset file: lica_layouts_rich_50k_train.jsonl

Training Hyperparameters

Parameter Value
Epochs 3
Total steps 2,196
Per-device batch size 1
Gradient accumulation 8 (effective batch = 8)
Learning rate 1e-4 (cosine decay)
Max sequence length 4096 tokens
Precision bfloat16 + Flash Attention 2
Loss function Causal LM cross-entropy (response tokens only)
Optimizer AdamW

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha 32
Dropout 0.05
Target modules dense, query_key_value, down_proj, up_proj, gate_proj
Modules to save lm_head
Trainable parameters 907,935,744 (3.74% of 24.27B total)

Training Loss

Step Loss
1 0.857
99 0.654
199 0.502
499 0.525
799 0.433
1099 0.514
1499 0.535
1799 0.525
2099 0.461
2196 0.438

Final train loss: 0.5405 β€” trained for ~50h 40min on a single GPU.

Compute

  • Hardware: 1Γ— GPU (A100-class)
  • Training time: ~50 hours 40 minutes (182,401 seconds)
  • Total FLOPs: 9.06 Γ— 10¹⁸
  • Adapter size: 3.0 GB

Sample Outputs

See batch_5_runs_rich.md for 5 end-to-end runs with generated poster images and layout JSONs.

Project Repository

github.com/diannedaian/random-task-0207

Framework Versions

  • PEFT 0.18.1
  • Transformers 4.x
  • PyTorch (bfloat16)
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DianneDaian/ming-lica-layout-lora

Adapter
(1)
this model