paper seminar_251001 - a jmkim0309 Collection

jmkim0309 's Collections

paper_seminar_260121

long video generation

paper seminar_251001

paper seminar_251001

updated Oct 24, 2025

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published Sep 8, 2025 • 32
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published Sep 8, 2025 • 29
Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8, 2025 • 15
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10, 2025 • 73
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

Paper • 2509.01624 • Published Sep 1, 2025 • 7
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Paper • 2509.06942 • Published Sep 8, 2025 • 17
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

Paper • 2509.15185 • Published Sep 18, 2025 • 29
LLM-I: LLMs are Naturally Interleaved Multimodal Creators

Paper • 2509.13642 • Published Sep 17, 2025 • 9
Image Tokenizer Needs Post-Training

Paper • 2509.12474 • Published Sep 15, 2025 • 8
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12, 2025 • 31
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 128
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

Paper • 2509.01977 • Published Sep 2, 2025 • 13
GenCompositor: Generative Video Compositing with Diffusion Transformer

Paper • 2509.02460 • Published Sep 2, 2025 • 26
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 89
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19, 2025 • 58
Lynx: Towards High-Fidelity Personalized Video Generation

Paper • 2509.15496 • Published Sep 19, 2025 • 13
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22, 2025 • 66
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

Paper • 2509.19244 • Published Sep 23, 2025 • 12
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23, 2025 • 23
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Paper • 2510.05094 • Published Oct 6, 2025 • 38
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

Paper • 2509.25771 • Published Sep 30, 2025 • 11
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30, 2025 • 37
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2, 2025 • 96
UltraGen: High-Resolution Video Generation with Hierarchical Attention

Paper • 2510.18775 • Published Oct 21, 2025 • 18