Backlog
updated
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published
• 86
When an LLM is apprehensive about its answers -- and when its
uncertainty is justified
Paper
• 2503.01688
• Published
• 21
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
• 2503.00808
• Published
• 56
Chain of Draft: Thinking Faster by Writing Less
Paper
• 2502.18600
• Published
• 50
Multi-Turn Code Generation Through Single-Step Rewards
Paper
• 2502.20380
• Published
• 32
Self-rewarding correction for mathematical reasoning
Paper
• 2502.19613
• Published
• 82
MPO: Boosting LLM Agents with Meta Plan Optimization
Paper
• 2503.02682
• Published
• 29
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper
• 2310.12823
• Published
• 36
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published
• 104
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published
• 99
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
• 2501.04227
• Published
• 95
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
• 2412.19723
• Published
• 87
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
• 2501.18492
• Published
• 88
Towards Best Practices for Open Datasets for LLM Training
Paper
• 2501.08365
• Published
• 62
Paper
• 2412.15115
• Published
• 377
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published
• 94
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published
• 127
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
Survey on Evaluation of LLM-based Agents
Paper
• 2503.16416
• Published
• 96
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
• 2503.21460
• Published
• 83
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published
• 43
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
• 2504.00824
• Published
• 43
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published
• 303
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
Agentic Knowledgeable Self-awareness
Paper
• 2504.03553
• Published
• 27
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training
Tokens
Paper
• 2504.07096
• Published
• 77
Missing Premise exacerbates Overthinking: Are Reasoning Models losing
Critical Thinking Skill?
Paper
• 2504.06514
• Published
• 39
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
Breaking the Data Barrier -- Building GUI Agents Through Task
Generalization
Paper
• 2504.10127
• Published
• 17
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55