My thing
updated
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper
• 2511.18373
• Published • 7
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper
• 2511.13288
• Published • 19
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper
• 2511.19418
• Published • 29
SAM 3: Segment Anything with Concepts
Paper
• 2511.16719
• Published • 134
Temporal Prompting Matters: Rethinking Referring Video Object
Segmentation
Paper
• 2510.07319
• Published • 3
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published • 94
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Paper
• 2511.13593
• Published • 28
RynnVLA-002: A Unified Vision-Language-Action and World Model
Paper
• 2511.17502
• Published • 28
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
Paper
• 2511.11007
• Published • 15
Depth Anything 3: Recovering the Visual Space from Any Views
Paper
• 2511.10647
• Published • 101
LightRAG: Simple and Fast Retrieval-Augmented Generation
Paper
• 2410.05779
• Published • 34
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
• 2510.14528
• Published • 122
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper
• 2412.20138
• Published • 33
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper
• 2410.17799
• Published • 12
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
Paper
• 2511.13648
• Published • 53
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
• 2509.22186
• Published • 154
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite
Imagery
Paper
• 2510.15869
• Published • 50
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
Paper
• 2511.15705
• Published • 98
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable
Reasoning
Paper
• 2510.22543
• Published • 14
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
Paper
• 2511.19900
• Published • 49
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
Paper
• 2512.10867
• Published • 16
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Paper
• 2503.04721
• Published • 4
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper
• 2603.17187
• Published • 134
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models
Paper
• 2603.17117
• Published • 87
Complementary Reinforcement Learning
Paper
• 2603.17621
• Published • 36
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents
Paper
• 2603.16496
• Published • 13
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
Paper
• 2603.18004
• Published • 12
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper
• 2603.13398
• Published • 152
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation
Paper
• 2603.16669
• Published • 70
Efficient Reasoning on the Edge
Paper
• 2603.16867
• Published • 18
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation
Paper
• 2603.16864
• Published • 16
Omnilingual MT: Machine Translation for 1,600 Languages
Paper
• 2603.16309
• Published • 20
ViT-AdaLA: Adapting Vision Transformers with Linear Attention
Paper
• 2603.16063
• Published • 2
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper
• 2603.15594
• Published • 148
Mixture-of-Depths Attention
Paper
• 2603.15619
• Published • 79
Can Vision-Language Models Solve the Shell Game?
Paper
• 2603.08436
• Published • 39
Multimodal OCR: Parse Anything from Documents
Paper
• 2603.13032
• Published • 40
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
Paper
• 2603.12824
• Published • 5