PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models Paper • 2605.20873 • Published 1 day ago • 1
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents Paper • 2605.21384 • Published 1 day ago • 2
Mem-π: Adaptive Memory through Learning When and What to Generate Paper • 2605.21463 • Published 1 day ago • 1
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning Paper • 2605.21487 • Published 1 day ago • 8
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects Paper • 2605.19587 • Published 2 days ago
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 2 days ago • 52
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 2 days ago • 57
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding Paper • 2605.20104 • Published 2 days ago • 4
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models Paper • 2605.18601 • Published 3 days ago • 3
AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents Paper • 2605.16819 • Published 5 days ago • 2
AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents Paper • 2605.17933 • Published 3 days ago • 5
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes Paper • 2605.15843 • Published 6 days ago • 4
FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction Paper • 2605.15320 • Published 7 days ago • 5
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design Paper • 2605.15871 • Published 6 days ago • 14
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 6 days ago • 32
Look Before You Leap: Autonomous Exploration for LLM Agents Paper • 2605.16143 • Published 6 days ago • 7