TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation Paper • 2603.19039 • Published 12 days ago • 50
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 15 days ago • 28
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published Dec 28, 2025 • 10
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 48
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 39
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 57