Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 18
Ming-V2 Collection Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 11 items • Updated about 8 hours ago • 33
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 8 days ago • 294
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models Paper • 2602.06694 • Published 6 days ago • 12
Data Science and Technology Towards AGI Part I: Tiered Data Management Paper • 2602.09003 • Published 3 days ago • 5
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published 4 days ago • 37
UI-Venus Technical Report: Building High-performance UI Agents with RFT Paper • 2508.10833 • Published Aug 14, 2025 • 45
Reinforcement World Model Learning for LLM-based Agents Paper • 2602.05842 • Published 7 days ago • 25
Privileged Information Distillation for Language Models Paper • 2602.04942 • Published 8 days ago • 24
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published Jan 9 • 83
RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Paper • 2602.02486 • Published 10 days ago • 17
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published 13 days ago • 35
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 10 days ago • 125
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 12 days ago • 39
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published 11 days ago • 46
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Paper • 2601.23143 • Published 13 days ago • 38
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 14 days ago • 58
Beyond Imitation: Reinforcement Learning for Active Latent Planning Paper • 2601.21598 • Published 14 days ago • 9