Reasoning, Thinking, RL and Test-Time Scaling - a kaizuberbuehler Collection

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46

Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 62

MALT: Improving Reasoning with Multi-Agent LLM Training

Paper • 2412.01928 • Published Dec 2, 2024 • 45

Mars-PO: Multi-Agent Reasoning System Preference Optimization

Paper • 2411.19039 • Published Nov 28, 2024 • 1

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Paper • 2410.22304 • Published Oct 29, 2024 • 18

o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published Nov 29, 2024 • 44

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 61

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Paper • 2410.09671 • Published Oct 12, 2024 • 1

SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation

Paper • 2411.11053 • Published Nov 17, 2024 • 4

Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

Paper • 2411.18478 • Published Nov 27, 2024 • 37

Reverse Thinking Makes LLMs Stronger Reasoners

Paper • 2411.19865 • Published Nov 29, 2024 • 23

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Paper • 2411.16579 • Published Nov 25, 2024 • 3

Vision-Language Models Can Self-Improve Reasoning via Reflection

Paper • 2411.00855 • Published Oct 30, 2024 • 5

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 40

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 45

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Paper • 2411.14794 • Published Nov 22, 2024 • 13

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 54

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 129

Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published Nov 12, 2024 • 65

Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8, 2025 • 288

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Paper • 2501.04686 • Published Jan 8, 2025 • 53

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8, 2025 • 99

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Paper • 2501.03226 • Published Jan 6, 2025 • 43

Test-time Computing: from System-1 Thinking to System-2 Thinking

Paper • 2501.02497 • Published Jan 5, 2025 • 45

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published Jan 3, 2025 • 33

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 40

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published Jan 9, 2025 • 102

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13, 2025 • 100

O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

Paper • 2501.06458 • Published Jan 11, 2025 • 31

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10, 2025 • 65

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Paper • 2501.09751 • Published Jan 16, 2025 • 46

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16, 2025 • 41

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 440

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22, 2025 • 126

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 124

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5, 2025 • 58

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 152

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 124

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Paper • 2502.06773 • Published Feb 10, 2025 • 1

Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published Feb 3, 2025 • 69

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17, 2025 • 115

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published Jan 21, 2025 • 45

Reasoning Language Models: A Blueprint

Paper • 2501.11223 • Published Jan 20, 2025 • 33

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Paper • 2501.12570 • Published Jan 22, 2025 • 28

Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament

Paper • 2501.13007 • Published Jan 22, 2025 • 19

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23, 2025 • 43

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18, 2025 • 15

Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24, 2025 • 58

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31

Atla Selene Mini: A General Purpose Evaluation Model

Paper • 2501.17195 • Published Jan 27, 2025 • 35

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30, 2025 • 61

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29, 2025 • 23

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31, 2025 • 39

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3, 2025 • 62

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Paper • 2502.13124 • Published Feb 18, 2025 • 7

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3, 2025 • 28

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Paper • 2502.02508 • Published Feb 4, 2025 • 22

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Paper • 2502.02584 • Published Feb 4, 2025 • 16

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 62

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published Feb 4, 2025 • 18

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Paper • 2502.03275 • Published Feb 5, 2025 • 18

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published Feb 5, 2025 • 44

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published Feb 6, 2025 • 25

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 152

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

Paper • 2502.05163 • Published Feb 7, 2025 • 22

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6, 2025 • 25

Generating Symbolic World Models via Test-time Scaling of Large Language Models

Paper • 2502.04728 • Published Feb 7, 2025 • 19

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published Feb 10, 2025 • 58

Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Paper • 2502.06772 • Published Feb 10, 2025 • 21

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11, 2025 • 50

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11, 2025 • 40

Teaching Language Models to Critique via Reinforcement Learning

Paper • 2502.03492 • Published Feb 5, 2025 • 24

Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

Paper • 2502.08127 • Published Feb 12, 2025 • 59

Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning

Paper • 2502.06533 • Published Feb 10, 2025 • 17

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe

Paper • 2502.09056 • Published Feb 13, 2025 • 31

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published Feb 13, 2025 • 37

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published Feb 13, 2025 • 28

Logical Reasoning in Large Language Models: A Survey

Paper • 2502.09100 • Published Feb 13, 2025 • 24

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published Feb 13, 2025 • 16

Typhoon T1: An Open Thai Reasoning Model

Paper • 2502.09042 • Published Feb 13, 2025 • 16

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Paper • 2502.09601 • Published Feb 13, 2025 • 14

Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges

Paper • 2502.08680 • Published Feb 12, 2025 • 11

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17, 2025 • 39

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20, 2025 • 63

Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14, 2025 • 18

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 38

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20, 2025 • 47

AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

Paper • 2502.14669 • Published Feb 20, 2025 • 15

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Paper • 2502.10458 • Published Feb 12, 2025 • 38

S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published Feb 18, 2025 • 29

Thinking Preference Optimization

Paper • 2502.13173 • Published Feb 17, 2025 • 17

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Paper • 2502.19361 • Published Feb 26, 2025 • 28

LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published Feb 21, 2025 • 31

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52

R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning

Paper • 2502.19735 • Published Feb 27, 2025 • 9

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Paper • 2502.16111 • Published Feb 22, 2025 • 9

TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21, 2025 • 9

The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer

Paper • 2502.15631 • Published Feb 21, 2025 • 9

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published Feb 22, 2025 • 18

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Paper • 2502.17407 • Published Feb 24, 2025 • 25

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Paper • 2502.18906 • Published Feb 26, 2025 • 12

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Paper • 2502.19328 • Published Feb 26, 2025 • 23

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 86

Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25, 2025 • 50

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published Mar 5, 2025 • 39

DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking

Paper • 2502.20730 • Published Feb 28, 2025 • 38

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3, 2025 • 38

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

Paper • 2502.20396 • Published Feb 27, 2025 • 15

Language Models can Self-Improve at State-Value Estimation for Better Search

Paper • 2503.02878 • Published Mar 4, 2025 • 10

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Paper • 2502.20545 • Published Feb 27, 2025 • 22

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7, 2025 • 123

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10, 2025 • 88

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10, 2025 • 61

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7, 2025 • 57

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Paper • 2503.10480 • Published Mar 13, 2025 • 56

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10, 2025 • 48

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7, 2025 • 38

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9, 2025 • 31

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Paper • 2503.10460 • Published Mar 13, 2025 • 30

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7, 2025 • 27

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Paper • 2503.07608 • Published Mar 10, 2025 • 23

Implicit Reasoning in Transformers is Reasoning through Shortcuts

Paper • 2503.07604 • Published Mar 10, 2025 • 23

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120

Learning from Failures in Multi-Attempt Reinforcement Learning

Paper • 2503.04808 • Published Mar 4, 2025 • 18

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13, 2025 • 17

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Paper • 2503.08525 • Published Mar 11, 2025 • 17

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 144

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published Mar 20, 2025 • 77

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Paper • 2503.12533 • Published Mar 16, 2025 • 68

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18, 2025 • 50

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Paper • 2503.12349 • Published Mar 16, 2025 • 44

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16, 2025 • 35

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17, 2025 • 32

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17, 2025 • 30

MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion

Paper • 2503.16212 • Published Mar 20, 2025 • 25

MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer

Paper • 2503.14891 • Published Mar 19, 2025 • 22

STEVE: AStep Verification Pipeline for Computer-use Agent Training

Paper • 2503.12532 • Published Mar 16, 2025 • 17

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published Mar 19, 2025 • 13

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

Paper • 2503.16188 • Published Mar 20, 2025 • 13

Temporal Consistency for LLM Reasoning Process Error Identification

Paper • 2503.14495 • Published Mar 18, 2025 • 11

Free-form language-based robotic reasoning and grasping

Paper • 2503.13082 • Published Mar 17, 2025 • 11

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

Paper • 2503.12505 • Published Mar 16, 2025 • 11

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24, 2025 • 119

Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27, 2025 • 79

Open Deep Search: Democratizing Search with Open-source Reasoning Agents

Paper • 2503.20201 • Published Mar 26, 2025 • 48

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Paper • 2503.21380 • Published Mar 27, 2025 • 38

Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation

Paper • 2503.19622 • Published Mar 25, 2025 • 31

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24, 2025 • 31

ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation

Paper • 2503.21729 • Published Mar 27, 2025 • 29

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Paper • 2503.19855 • Published Mar 25, 2025 • 29

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21, 2025 • 24

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published Mar 27, 2025 • 23

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

Paper • 2503.18013 • Published Mar 23, 2025 • 20

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Paper • 2503.19470 • Published Mar 25, 2025 • 19

FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

Paper • 2503.17287 • Published Mar 21, 2025 • 11

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

Paper • 2503.20641 • Published Mar 26, 2025 • 10

Implicit Bias-Like Patterns in Reasoning Models

Paper • 2503.11572 • Published Mar 14, 2025 • 8

RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Paper • 2505.15034 • Published May 21, 2025 • 5

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Paper • 2504.00883 • Published Apr 1, 2025 • 67

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31, 2025 • 62

JudgeLRM: Large Reasoning Models as a Judge

Paper • 2504.00050 • Published Mar 31, 2025 • 62

Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3, 2025 • 58

What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31, 2025 • 54

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59

Efficient Inference for Large Reasoning Models: A Survey

Paper • 2503.23077 • Published Mar 29, 2025 • 46

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published Mar 27, 2025 • 43

Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1

Paper • 2503.24376 • Published Mar 31, 2025 • 38

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

Paper • 2503.22675 • Published Mar 28, 2025 • 36

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Paper • 2503.23145 • Published Mar 29, 2025 • 35

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published Apr 3, 2025 • 32

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

Paper • 2503.22165 • Published Mar 28, 2025 • 28

Z1: Efficient Test-time Scaling with Code

Paper • 2504.00810 • Published Apr 1, 2025 • 26

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31, 2025 • 24

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

Paper • 2504.00509 • Published Apr 1, 2025 • 24

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

Paper • 2503.21332 • Published Mar 27, 2025 • 23

Effectively Controlling Reasoning Models through Thinking Intervention

Paper • 2503.24370 • Published Mar 31, 2025 • 19

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

Paper • 2503.24377 • Published Mar 31, 2025 • 18

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Paper • 2504.01005 • Published Apr 1, 2025 • 15

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Paper • 2504.00891 • Published Apr 1, 2025 • 14

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Paper • 2504.01871 • Published Apr 2, 2025 • 12

m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models

Paper • 2504.00869 • Published Apr 1, 2025 • 10

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Paper • 2504.00294 • Published Mar 31, 2025 • 10

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29, 2025 • 10

VerifiAgent: a Unified Verification Agent in Language Model Reasoning

Paper • 2504.00406 • Published Apr 1, 2025 • 8

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2, 2025 • 87

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Paper • 2504.05599 • Published Apr 8, 2025 • 85

Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published Apr 5, 2025 • 80

T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Paper • 2504.04718 • Published Apr 7, 2025 • 43

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Paper • 2504.06514 • Published Apr 9, 2025 • 39

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Paper • 2504.04823 • Published Apr 7, 2025 • 31

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Paper • 2504.07086 • Published Apr 9, 2025 • 21

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Paper • 2504.07934 • Published Apr 10, 2025 • 21

Self-Steering Language Models

Paper • 2504.07081 • Published Apr 9, 2025 • 19

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Paper • 2504.03561 • Published Apr 4, 2025 • 18

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published Apr 4, 2025 • 15

Generative Evaluation of Complex Reasoning in Large Language Models

Paper • 2504.02810 • Published Apr 3, 2025 • 14

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published Apr 9, 2025 • 13

Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence

Paper • 2503.20533 • Published Mar 26, 2025 • 12

Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Paper • 2504.05520 • Published Apr 7, 2025 • 11

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15, 2025 • 63

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11, 2025 • 55

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 43

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Paper • 2504.10766 • Published Apr 14, 2025 • 40

Heimdall: test-time scaling on the generative verification

Paper • 2504.10337 • Published Apr 14, 2025 • 33

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10, 2025 • 35

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11, 2025 • 33

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10, 2025 • 30

S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models

Paper • 2504.10368 • Published Apr 14, 2025 • 22

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published Apr 17, 2025 • 19

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 19

Efficient Reasoning Models: A Survey

Paper • 2504.10903 • Published Apr 15, 2025 • 21

DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training

Paper • 2504.09710 • Published Apr 13, 2025 • 19

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Paper • 2504.09641 • Published Apr 13, 2025 • 16

Sleep-time Compute: Beyond Inference Scaling at Test-time

Paper • 2504.13171 • Published Apr 17, 2025 • 15

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15, 2025 • 16

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14, 2025 • 15

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL

Paper • 2504.11455 • Published Apr 15, 2025 • 14

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Paper • 2504.08066 • Published Apr 10, 2025 • 16

Reasoning Models Can Be Effective Without Thinking

Paper • 2504.09858 • Published Apr 14, 2025 • 12

VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

Paper • 2504.09130 • Published Apr 12, 2025 • 12

Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution

Paper • 2504.09566 • Published Apr 13, 2025 • 11

Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models

Paper • 2504.05262 • Published Apr 7, 2025 • 11

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Paper • 2504.07891 • Published Apr 10, 2025 • 5

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21, 2025 • 78

Tina: Tiny Reasoning Models via LoRA

Paper • 2504.15777 • Published Apr 22, 2025 • 56

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published Apr 21, 2025 • 47

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16, 2025 • 49

Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published Apr 21, 2025 • 44

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Paper • 2504.16074 • Published Apr 22, 2025 • 36

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation

Paper • 2504.17207 • Published Apr 24, 2025 • 30

THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Paper • 2504.13367 • Published Apr 17, 2025 • 26

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22, 2025 • 21

Generative AI Act II: Test Time Scaling Drives Cognition Engineering

Paper • 2504.13828 • Published Apr 18, 2025 • 18

Process Reward Models That Think

Paper • 2504.16828 • Published Apr 23, 2025 • 18

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

Paper • 2511.14659 • Published Nov 18, 2025 • 13