Running on CPU Upgrade Featured 2.98k The Smol Training Playbook 📚 2.98k The secrets to building world-class LLMs
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 225
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published Jan 5 • 109
NV-Embed Collection NV-Embed is a generalist embedding model encompassing retrieval, reranking, classification, clustering, STS tasks. • 3 items • Updated 11 days ago • 17
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process Paper • 2512.23988 • Published Dec 30, 2025 • 17
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling Paper • 2512.23959 • Published Dec 30, 2025 • 112
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published Dec 31, 2025 • 64
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 98
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published Dec 31, 2025 • 151
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 135
Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published Sep 26, 2025 • 118