kaizuberbuehler 's Collections Reasoning, Thinking, RL and Test-Time Scaling
updated
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
• 2412.18319
• Published
• 39
Token-Budget-Aware LLM Reasoning
Paper
• 2412.18547
• Published
• 46
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper
• 2412.20993
• Published
• 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
• 2412.17256
• Published
• 47
Paper
• 2412.16720
• Published
• 37
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
Paper
• 2412.17498
• Published
• 22
Outcome-Refining Process Supervision for Code Generation
Paper
• 2412.15118
• Published
• 19
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
Reasoning Capability
Paper
• 2411.19943
• Published
• 62
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper
• 2412.01928
• Published
• 45
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Paper
• 2411.19039
• Published
• 1
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
• 2410.22304
• Published
• 18
o1-Coder: an o1 Replication for Coding
Paper
• 2412.00154
• Published
• 44
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
• 2411.14405
• Published
• 61
OpenR: An Open Source Framework for Advanced Reasoning with Large
Language Models
Paper
• 2410.09671
• Published
• 1
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree
Search for Code Generation
Paper
• 2411.11053
• Published
• 4
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context
Learning via MCTS
Paper
• 2411.18478
• Published
• 37
Reverse Thinking Makes LLMs Stronger Reasoners
Paper
• 2411.19865
• Published
• 23
Enhancing LLM Reasoning via Critique Models with Test-Time and
Training-Time Supervision
Paper
• 2411.16579
• Published
• 3
Vision-Language Models Can Self-Improve Reasoning via Reflection
Paper
• 2411.00855
• Published
• 5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published
• 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
• 2411.14432
• Published
• 25
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
• 2411.18203
• Published
• 40
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
• 2411.16489
• Published
• 45
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained
Video Reasoning via Core Frame Selection
Paper
• 2411.14794
• Published
• 13
Enhancing the Reasoning Ability of Multimodal Large Language Models via
Mixed Preference Optimization
Paper
• 2411.10442
• Published
• 87
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
• 2410.02884
• Published
• 54
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
• 2411.10440
• Published
• 129
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
• 2411.08147
• Published
• 65
Self-Consistency Preference Optimization
Paper
• 2411.04109
• Published
• 19
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics
Paper
• 2501.04686
• Published
• 53
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published
• 99
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
• 2501.03226
• Published
• 43
Test-time Computing: from System-1 Thinking to System-2 Thinking
Paper
• 2501.02497
• Published
• 45
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
• 2501.01904
• Published
• 33
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper
• 2412.21187
• Published
• 40
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical
Reasoning
Paper
• 2501.06458
• Published
• 31
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
• 2501.06186
• Published
• 65
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
• 2501.09751
• Published
• 46
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 440
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
• 2501.12599
• Published
• 126
s1: Simple test-time scaling
Paper
• 2501.19393
• Published
• 124
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
• 2502.03373
• Published
• 58
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
On the Emergence of Thinking in LLMs I: Searching for the Right
Intuition
Paper
• 2502.06773
• Published
• 1
Competitive Programming with Large Reasoning Models
Paper
• 2502.06807
• Published
• 69
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs)
More Self-Confident Even When They Are Wrong
Paper
• 2501.09775
• Published
• 32
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
Paper
• 2501.12368
• Published
• 45
Reasoning Language Models: A Blueprint
Paper
• 2501.11223
• Published
• 33
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Paper
• 2501.12570
• Published
• 28
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
Paper
• 2501.13007
• Published
• 19
Can We Generate Images with CoT? Let's Verify and Reinforce Image
Generation Step by Step
Paper
• 2501.13926
• Published
• 43
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published
• 15
Chain-of-Retrieval Augmented Generation
Paper
• 2501.14342
• Published
• 58
RL + Transformer = A General-Purpose Problem Solver
Paper
• 2501.14176
• Published
• 28
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published
• 31
Atla Selene Mini: A General Purpose Evaluation Model
Paper
• 2501.17195
• Published
• 35
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
• 2501.18585
• Published
• 61
Large Language Models Think Too Fast To Explore Effectively
Paper
• 2501.18009
• Published
• 23
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
• 2501.19324
• Published
• 39
Process Reinforcement through Implicit Rewards
Paper
• 2502.01456
• Published
• 62
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Paper
• 2502.13124
• Published
• 7
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
• 2502.01718
• Published
• 28
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
• 2502.02508
• Published
• 22
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
• 2502.02584
• Published
• 16
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published
• 62
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
• 2502.02339
• Published
• 23
On Teacher Hacking in Language Model Distillation
Paper
• 2502.02671
• Published
• 18
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
• 2502.03275
• Published
• 18
Gold-medalist Performance in Solving Olympiad Geometry with
AlphaGeometry2
Paper
• 2502.03544
• Published
• 44
BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation
Paper
• 2502.03860
• Published
• 25
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM
Guardrails
Paper
• 2502.05163
• Published
• 22
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
• 2502.04404
• Published
• 25
Generating Symbolic World Models via Test-time Scaling of Large Language
Models
Paper
• 2502.04728
• Published
• 19
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published
• 58
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
• 2502.06060
• Published
• 38
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper
• 2502.06772
• Published
• 21
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
Paper
• 2502.07316
• Published
• 50
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published
• 40
Teaching Language Models to Critique via Reinforcement Learning
Paper
• 2502.03492
• Published
• 24
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper
• 2502.08127
• Published
• 59
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to
Enhance RL Fine-Tuning
Paper
• 2502.06533
• Published
• 17
Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe
Paper
• 2502.09056
• Published
• 31
SelfCite: Self-Supervised Alignment for Context Attribution in Large
Language Models
Paper
• 2502.09604
• Published
• 37
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for
Reasoning Quality, Robustness, and Efficiency
Paper
• 2502.09621
• Published
• 28
Logical Reasoning in Large Language Models: A Survey
Paper
• 2502.09100
• Published
• 24
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced
Chain-of-Thought in Large Language Models
Paper
• 2502.09390
• Published
• 16
Typhoon T1: An Open Thai Reasoning Model
Paper
• 2502.09042
• Published
• 16
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Paper
• 2502.09601
• Published
• 14
Mathematical Reasoning in Large Language Models: Assessing Logical and
Arithmetic Errors across Wide Numerical Ranges
Paper
• 2502.08680
• Published
• 11
Small Models Struggle to Learn from Strong Reasoners
Paper
• 2502.12143
• Published
• 39
S*: Test Time Scaling for Code Generation
Paper
• 2502.14382
• Published
• 63
Diverse Inference and Verification for Advanced Reasoning
Paper
• 2502.09955
• Published
• 18
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
• 2503.09516
• Published
• 38
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
• 2502.18449
• Published
• 75
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
• 2502.14768
• Published
• 47
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via
GRPO
Paper
• 2502.14669
• Published
• 15
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published
• 38
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement
Learning
Paper
• 2502.12853
• Published
• 29
Thinking Preference Optimization
Paper
• 2502.13173
• Published
• 17
Self-rewarding correction for mathematical reasoning
Paper
• 2502.19613
• Published
• 82
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
• 2502.19361
• Published
• 28
LightThinker: Thinking Step-by-Step Compression
Paper
• 2502.15589
• Published
• 31
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
• 2503.16219
• Published
• 52
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning
Learning
Paper
• 2502.19735
• Published
• 9
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning
Trajectories for Complex Problem Solving
Paper
• 2502.16111
• Published
• 9
TAG: A Decentralized Framework for Multi-Agent Hierarchical
Reinforcement Learning
Paper
• 2502.15425
• Published
• 9
The Relationship Between Reasoning and Performance in Large Language
Models -- o3 (mini) Thinks Harder, Not Longer
Paper
• 2502.15631
• Published
• 9
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Paper
• 2502.16033
• Published
• 18
Linguistic Generalizability of Test-Time Scaling in Mathematical
Reasoning
Paper
• 2502.17407
• Published
• 25
VEM: Environment-Free Exploration for Training GUI Agent with Value
Environment Model
Paper
• 2502.18906
• Published
• 12
Agentic Reward Modeling: Integrating Human Preferences with Verifiable
Correctness Signals for Reliable Reward Systems
Paper
• 2502.19328
• Published
• 23
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published
• 113
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published
• 86
Chain of Draft: Thinking Faster by Writing Less
Paper
• 2502.18600
• Published
• 50
Process-based Self-Rewarding Language Models
Paper
• 2503.03746
• Published
• 39
DeepSolution: Boosting Complex Engineering Solution Design via
Tree-based Exploration and Bi-point Thinking
Paper
• 2502.20730
• Published
• 38
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
• 2503.01307
• Published
• 38
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous
Manipulation on Humanoids
Paper
• 2502.20396
• Published
• 15
Language Models can Self-Improve at State-Value Estimation for Better
Search
Paper
• 2503.02878
• Published
• 10
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published
• 139
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper
• 2502.20545
• Published
• 22
Unified Reward Model for Multimodal Understanding and Generation
Paper
• 2503.05236
• Published
• 123
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
• 2503.07536
• Published
• 88
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published
• 61
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published
• 57
World Modeling Makes a Better Planner: Dual Preference Optimization for
Embodied Task Planning
Paper
• 2503.10480
• Published
• 56
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
• 2503.05179
• Published
• 46
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
• 2503.07572
• Published
• 48
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
• 2503.10291
• Published
• 36
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
• 2503.05379
• Published
• 38
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large
Language Models
Paper
• 2503.06749
• Published
• 31
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
• 2503.10460
• Published
• 30
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
• 2503.05592
• Published
• 27
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via
Reinforcement Learning and Reasoning
Paper
• 2503.07608
• Published
• 23
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Paper
• 2503.07604
• Published
• 23
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
• 2503.04808
• Published
• 18
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
• 2503.10615
• Published
• 17
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
• 2503.08525
• Published
• 17
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
Learning
Paper
• 2503.21620
• Published
• 62
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
• 2503.16419
• Published
• 77
Being-0: A Humanoid Robotic Agent with Vision-Language Models and
Modular Skills
Paper
• 2503.12533
• Published
• 68
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
• 2503.15558
• Published
• 50
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Paper
• 2503.12349
• Published
• 44
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
• 2503.12605
• Published
• 35
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
• 2503.12797
• Published
• 32
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
• 2503.12937
• Published
• 30
MathFusion: Enhancing Mathematic Problem-solving of LLM through
Instruction Fusion
Paper
• 2503.16212
• Published
• 25
MetaLadder: Ascending Mathematical Solution Quality via
Analogical-Problem Reasoning Transfer
Paper
• 2503.14891
• Published
• 22
STEVE: AStep Verification Pipeline for Computer-use Agent Training
Paper
• 2503.12532
• Published
• 17
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Paper
• 2503.15478
• Published
• 13
Measuring AI Ability to Complete Long Tasks
Paper
• 2503.14499
• Published
• 16
CLS-RL: Image Classification with Rule-Based Reinforcement Learning
Paper
• 2503.16188
• Published
• 13
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
• 2503.14495
• Published
• 11
Free-form language-based robotic reasoning and grasping
Paper
• 2503.13082
• Published
• 11
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process
Errors Identification
Paper
• 2503.12505
• Published
• 11
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
• 2503.18878
• Published
• 119
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
• 2503.21776
• Published
• 79
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper
• 2503.20201
• Published
• 48
Challenging the Boundaries of Reasoning: An Olympiad-Level Math
Benchmark for Large Language Models
Paper
• 2503.21380
• Published
• 38
Exploring Hallucination of Large Multimodal Models in Video
Understanding: Benchmark, Analysis and Mitigation
Paper
• 2503.19622
• Published
• 31
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for
Open Base Models in the Wild
Paper
• 2503.18892
• Published
• 31
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large
Reasoning Models with Iterative Retrieval Augmented Generation
Paper
• 2503.21729
• Published
• 29
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
• 2503.19855
• Published
• 29
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
• 2503.17352
• Published
• 24
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for
Embodied Interactive Tasks
Paper
• 2503.21696
• Published
• 23
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models
via Vision-Guided Reinforcement Learning
Paper
• 2503.18013
• Published
• 20
ReSearch: Learning to Reason with Search for LLMs via Reinforcement
Learning
Paper
• 2503.19470
• Published
• 19
FastCuRL: Curriculum Reinforcement Learning with Progressive Context
Extension for Efficient Training R1-like Reasoning Models
Paper
• 2503.17287
• Published
• 11
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Paper
• 2503.20641
• Published
• 10
Implicit Bias-Like Patterns in Reasoning Models
Paper
• 2503.11572
• Published
• 8
RL Tango: Reinforcing Generator and Verifier Together for Language
Reasoning
Paper
• 2505.15034
• Published
• 5
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
• 2504.00883
• Published
• 67
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
JudgeLRM: Large Reasoning Models as a Judge
Paper
• 2504.00050
• Published
• 62
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
• 2503.24235
• Published
• 54
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
• 2503.20783
• Published
• 59
Efficient Inference for Large Reasoning Models: A Survey
Paper
• 2503.23077
• Published
• 46
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published
• 43
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
• 2503.24376
• Published
• 38
Think Before Recommend: Unleashing the Latent Reasoning Power for
Sequential Recommendation
Paper
• 2503.22675
• Published
• 36
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive
Program Synthesis
Paper
• 2503.23145
• Published
• 35
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
• 2504.02587
• Published
• 32
Landscape of Thoughts: Visualizing the Reasoning Process of Large
Language Models
Paper
• 2503.22165
• Published
• 28
Z1: Efficient Test-time Scaling with Code
Paper
• 2504.00810
• Published
• 26
Expanding RL with Verifiable Rewards Across Diverse Domains
Paper
• 2503.23829
• Published
• 24
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on
Elementary School-Level Reasoning Problems?
Paper
• 2504.00509
• Published
• 24
ReFeed: Multi-dimensional Summarization Refinement with Reflective
Reasoning on Feedback
Paper
• 2503.21332
• Published
• 23
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
• 2503.24370
• Published
• 19
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for
Large Language Models
Paper
• 2503.24377
• Published
• 18
When To Solve, When To Verify: Compute-Optimal Problem Solving and
Generative Verification for LLM Reasoning
Paper
• 2504.01005
• Published
• 15
GenPRM: Scaling Test-Time Compute of Process Reward Models via
Generative Reasoning
Paper
• 2504.00891
• Published
• 14
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Paper
• 2504.01871
• Published
• 12
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning
with Large Language Models
Paper
• 2504.00869
• Published
• 10
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies
Ahead
Paper
• 2504.00294
• Published
• 10
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
for Reasoning-Enhanced Text-to-SQL
Paper
• 2503.23157
• Published
• 10
VerifiAgent: a Unified Verification Agent in Language Model Reasoning
Paper
• 2504.00406
• Published
• 8
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published
• 85
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published
• 80
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
• 2504.04718
• Published
• 43
Missing Premise exacerbates Overthinking: Are Reasoning Models losing
Critical Thinking Skill?
Paper
• 2504.06514
• Published
• 39
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning
Models
Paper
• 2504.04823
• Published
• 31
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published
• 26
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths
to Reproducibility
Paper
• 2504.07086
• Published
• 21
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual
Reasoning Self-Improvement
Paper
• 2504.07934
• Published
• 21
Self-Steering Language Models
Paper
• 2504.07081
• Published
• 19
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge
Refinement
Paper
• 2504.03561
• Published
• 18
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
• 2504.03151
• Published
• 15
Generative Evaluation of Complex Reasoning in Large Language Models
Paper
• 2504.02810
• Published
• 14
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
• 2504.06958
• Published
• 13
Accelerate Parallelizable Reasoning via Parallel Decoding within One
Sequence
Paper
• 2503.20533
• Published
• 12
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Paper
• 2504.05520
• Published
• 11
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published
• 43
How Instruction and Reasoning Data shape Post-Training: Data Quality
through the Lens of Layer-wise Gradients
Paper
• 2504.10766
• Published
• 40
Heimdall: test-time scaling on the generative verification
Paper
• 2504.10337
• Published
• 33
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Paper
• 2504.07615
• Published
• 35
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
• 2504.08600
• Published
• 33
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published
• 30
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability
of Large Reasoning Models
Paper
• 2504.10368
• Published
• 22
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
• 2504.13055
• Published
• 19
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published
• 19
Efficient Reasoning Models: A Survey
Paper
• 2504.10903
• Published
• 21
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training
Paper
• 2504.09710
• Published
• 19
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Paper
• 2504.09641
• Published
• 16
Sleep-time Compute: Beyond Inference Scaling at Test-time
Paper
• 2504.13171
• Published
• 15
ReZero: Enhancing LLM search ability by trying one-more-time
Paper
• 2504.11001
• Published
• 16
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper
• 2504.10449
• Published
• 15
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
• 2504.11455
• Published
• 14
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via
Agentic Tree Search
Paper
• 2504.08066
• Published
• 16
Reasoning Models Can Be Effective Without Thinking
Paper
• 2504.09858
• Published
• 12
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Paper
• 2504.09130
• Published
• 12
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Paper
• 2504.09566
• Published
• 11
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning
vs. Memorization in Large Language Models
Paper
• 2504.05262
• Published
• 11
SpecReason: Fast and Accurate Inference-Time Compute via Speculative
Reasoning
Paper
• 2504.07891
• Published
• 5
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published
• 88
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published
• 78
Tina: Tiny Reasoning Models via LoRA
Paper
• 2504.15777
• Published
• 56
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
• 2504.15257
• Published
• 47
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
Learning Adaptive Parallel Reasoning with Language Models
Paper
• 2504.15466
• Published
• 44
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in
Large Language Models
Paper
• 2504.16074
• Published
• 36
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published
• 35
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
Paper
• 2504.17207
• Published
• 30
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating
Overthinking in Reasoning Models
Paper
• 2504.13367
• Published
• 26
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published
• 21
Generative AI Act II: Test Time Scaling Drives Cognition Engineering
Paper
• 2504.13828
• Published
• 18
Process Reward Models That Think
Paper
• 2504.16828
• Published
• 18
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Paper
• 2511.14659
• Published
• 13