Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 6 days ago • 111
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published 14 days ago • 85
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 18 days ago • 129 • 9
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 18 days ago • 129
XSkill: Continual Learning from Experience and Skills in Multimodal Agents Paper • 2603.12056 • Published Mar 12 • 33
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published Feb 26 • 45
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads Paper • 2602.09443 • Published Feb 10 • 59
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning Paper • 2602.04634 • Published Feb 4 • 99
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 10 items • Updated 4 days ago • 107