OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 8 days ago • 298
Show, Don't Tell: Morphing Latent Reasoning into Image Generation Paper • 2602.02227 • Published 11 days ago • 10
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Paper • 2601.19325 • Published 17 days ago • 79
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper • 2512.14442 • Published Dec 16, 2025 • 11
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper • 2511.23127 • Published Nov 28, 2025 • 44
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression Paper • 2512.00891 • Published Nov 30, 2025 • 16
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper • 2511.13704 • Published Nov 17, 2025 • 43
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 17
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published Oct 22, 2025 • 69
AI for Service: Proactive Assistance with AI Glasses Paper • 2510.14359 • Published Oct 16, 2025 • 77
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published Oct 8, 2025 • 13
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness Paper • 2503.18445 • Published Mar 24, 2025 • 1
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 42
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16, 2025 • 28