Multimodal Alignment
updated
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large
Vision-Language Models
Paper
•
2410.17637
•
Published
•
35
Enhancing the Reasoning Ability of Multimodal Large Language Models via
Mixed Preference Optimization
Paper
•
2411.10442
•
Published
•
87
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
•
2411.18203
•
Published
•
40
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
•
2411.14432
•
Published
•
25
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
•
2412.04814
•
Published
•
46
Task Preference Optimization: Improving Multimodal Large Language Models
with Vision Task Alignment
Paper
•
2412.19326
•
Published
•
18
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper
•
2501.00192
•
Published
•
31
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
Paper
•
2501.12368
•
Published
•
45
Temporal Preference Optimization for Long-Form Video Understanding
Paper
•
2501.13919
•
Published
•
23
The Hidden Life of Tokens: Reducing Hallucination of Large
Vision-Language Models via Visual Information Steering
Paper
•
2502.03628
•
Published
•
12
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
•
2502.10391
•
Published
•
34
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper
•
2502.11775
•
Published
•
9
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Paper
•
2502.18411
•
Published
•
74
Unified Reward Model for Multimodal Understanding and Generation
Paper
•
2503.05236
•
Published
•
123
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
•
2503.10291
•
Published
•
36
Aligning Multimodal LLM with Human Preference: A Survey
Paper
•
2503.14504
•
Published
•
26
Exploring Hallucination of Large Multimodal Models in Video
Understanding: Benchmark, Analysis and Mitigation
Paper
•
2503.19622
•
Published
•
31
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
•
2504.10479
•
Published
•
306
Generate, but Verify: Reducing Hallucination in Vision-Language Models
with Retrospective Resampling
Paper
•
2504.13169
•
Published
•
39
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
Optimization for Large Video Models
Paper
•
2504.13122
•
Published
•
20
Self-alignment of Large Video Language Models with Refined Regularized
Preference Optimization
Paper
•
2504.12083
•
Published
•
3
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement
Learning
Paper
•
2505.02835
•
Published
•
28
Evaluating and Steering Modality Preferences in Multimodal Large
Language Model
Paper
•
2505.20977
•
Published
•
9
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Paper
•
2506.21656
•
Published
•
16
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
Optimization
Paper
•
2508.05731
•
Published
•
25
Enhancing Vision-Language Model Training with Reinforcement Learning in
Synthetic Worlds for Real-World Success
Paper
•
2508.04280
•
Published
•
35
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
Paper
•
2508.16292
•
Published
•
9
Reconstruction Alignment Improves Unified Multimodal Models
Paper
•
2509.07295
•
Published
•
40
Aligning Generative Music AI with Human Preferences: Methods and Challenges
Paper
•
2511.15038
•
Published
•
2
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment
Paper
•
2512.04356
•
Published
•
9
Self-Improving VLM Judges Without Human Annotations
Paper
•
2512.05145
•
Published
•
19
ProGuard: Towards Proactive Multimodal Safeguard
Paper
•
2512.23573
•
Published
•
5
Factorized Learning for Temporally Grounded Video-Language Models
Paper
•
2512.24097
•
Published
•
6
OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs
Paper
•
2601.01592
•
Published
•
11