ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
Paper • 2601.11077 • Published • 66
Natural Language Processing
CCTU: A Benchmark for Tool Use under Complex Constraints
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning