Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper β’ 2512.07461 β’ Published Dec 8, 2025 β’ 76
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper β’ 2508.10975 β’ Published Aug 14, 2025 β’ 60
Technical Report: Full-Stack Fine-Tuning for the Q Programming Language Paper β’ 2508.06813 β’ Published Aug 9, 2025 β’ 6
Mercury: Ultra-Fast Language Models Based on Diffusion Paper β’ 2506.17298 β’ Published Jun 17, 2025 β’ 7
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper β’ 2507.22448 β’ Published Jul 30, 2025 β’ 69
view article Article π€ππ¬π₯οΈπ Kimi-VL-A3B-Thinking-2506: A Quick Navigation Jun 21, 2025 β’ 74
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper β’ 2503.01295 β’ Published Mar 3, 2025 β’ 8
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging Paper β’ 2503.02783 β’ Published Mar 4, 2025 β’ 6
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition Paper β’ 2503.00735 β’ Published Mar 2, 2025 β’ 23
MPO: Boosting LLM Agents with Meta Plan Optimization Paper β’ 2503.02682 β’ Published Mar 4, 2025 β’ 29
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper β’ 2503.07572 β’ Published Mar 10, 2025 β’ 47
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper β’ 2502.20545 β’ Published Feb 27, 2025 β’ 22