LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 182
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 92
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Paper • 2511.13853 • Published Nov 17, 2025 • 34
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13, 2025 • 96
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12, 2025 • 69
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14, 2025 • 183
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 134
Disentangling Writer and Character Styles for Handwriting Generation Paper • 2303.14736 • Published Mar 26, 2023 • 3
HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation Paper • 2311.18158 • Published Nov 30, 2023
Fully Test-Time Adaptation for Monocular 3D Object Detection Paper • 2405.19682 • Published May 30, 2024
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Paper • 2412.04448 • Published Dec 5, 2024 • 10
One-Shot Diffusion Mimicker for Handwritten Text Generation Paper • 2409.04004 • Published Sep 6, 2024
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting Paper • 2410.08190 • Published Oct 10, 2024 • 1
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation Paper • 2503.11122 • Published Mar 14, 2025