Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.08068

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 9 days ago • 62
In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 12 days ago • 39

In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 12 days ago • 39
OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 11 days ago • 133
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Paper • 2603.03790 • Published 17 days ago • 119

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 58
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 138
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27, 2025 • 83
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 236

about 7 hours ago

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 12 days ago • 39

The Art of Efficient Reasoning: Data, Reward, and Optimization

Paper • 2602.20945 • Published 25 days ago • 7
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 53
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Paper • 2512.04359 • Published Dec 4, 2025
How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published 12 days ago • 56

Reinforcement Learning

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 33
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22, 2025 • 63
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Paper • 2511.07384 • Published Nov 10, 2025 • 19

about 8 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 786 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 38
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Reinforcement learning

about 24 hours ago

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 9 days ago • 62
In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 12 days ago • 39

In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 12 days ago • 39

In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 12 days ago • 39
OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 11 days ago • 133
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Paper • 2603.03790 • Published 17 days ago • 119

The Art of Efficient Reasoning: Data, Reward, and Optimization

Paper • 2602.20945 • Published 25 days ago • 7
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 53
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Paper • 2512.04359 • Published Dec 4, 2025
How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published 12 days ago • 56

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

Reinforcement Learning

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 33
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22, 2025 • 63
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Paper • 2511.07384 • Published Nov 10, 2025 • 19

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 58
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 138
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27, 2025 • 83
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 236

about 8 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 786 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 38
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

about 7 hours ago

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

Reinforcement learning

about 24 hours ago

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs