AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents Paper • 2603.14465 • Published Mar 15 • 23
LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published Feb 12 • 34
Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use Paper • 2602.11541 • Published Feb 12 • 2
DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents Paper • 2602.07035 • Published Feb 3 • 30
GISA: A Benchmark for General Information-Seeking Assistant Paper • 2602.08543 • Published Feb 9 • 26
Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience Paper • 2601.23188 • Published Jan 30 • 9
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Paper • 2601.13761 • Published Jan 20 • 16
When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs Paper • 2601.11000 • Published Jan 16 • 27
Deriving Character Logic from Storyline as Codified Decision Trees Paper • 2601.10080 • Published Jan 15 • 6
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching Paper • 2601.10712 • Published Jan 15 • 24
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration Paper • 2601.06860 • Published Jan 11 • 16
ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition Paper • 2601.03822 • Published Jan 7 • 24
PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation Paper • 2512.21227 • Published Dec 24, 2025 • 2
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published Dec 10, 2025 • 88
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models Paper • 2512.10867 • Published Dec 11, 2025 • 16