Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation Paper • 2603.12793 • Published 21 days ago • 38
Imagination Helps Visual Reasoning, But Not Yet in Latent Space Paper • 2602.22766 • Published Feb 26 • 42
MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios? Paper • 2512.23219 • Published Dec 29, 2025 • 4
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 190
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 308
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17, 2025 • 32