Collections
Discover the best community collections!
Collections including paper arxiv:2603.08068
-
In-Context Reinforcement Learning for Tool Use in Large Language Models
Paper • 2603.08068 • Published • 39 -
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 133 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 119
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
Paper • 2510.18019 • Published • 18 -
PORTool: Tool-Use LLM Training with Rewarded Tree
Paper • 2510.26020 • Published • 5 -
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Paper • 2510.24992 • Published • 4 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 41
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 58 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 138 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 236
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
Paper • 2602.20945 • Published • 7 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 53 -
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Paper • 2512.04359 • Published -
How Far Can Unsupervised RLVR Scale LLM Training?
Paper • 2603.08660 • Published • 56
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 33 -
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper • 2510.19363 • Published • 63 -
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Paper • 2510.25992 • Published • 48 -
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Paper • 2511.07384 • Published • 19
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 786 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
In-Context Reinforcement Learning for Tool Use in Large Language Models
Paper • 2603.08068 • Published • 39 -
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 133 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 119
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
Paper • 2602.20945 • Published • 7 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 53 -
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Paper • 2512.04359 • Published -
How Far Can Unsupervised RLVR Scale LLM Training?
Paper • 2603.08660 • Published • 56
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
Paper • 2510.18019 • Published • 18 -
PORTool: Tool-Use LLM Training with Rewarded Tree
Paper • 2510.26020 • Published • 5 -
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Paper • 2510.24992 • Published • 4 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 41
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 33 -
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper • 2510.19363 • Published • 63 -
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Paper • 2510.25992 • Published • 48 -
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Paper • 2511.07384 • Published • 19
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 58 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 138 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 236
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 786 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75