-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 92 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2602.01785
-
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
Paper • 2602.01785 • Published • 93 -
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
Paper • 2601.17277 • Published • 6 -
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
Paper • 2602.08961 • Published • 3
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 52 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 18 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 42 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 50
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 92 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
Paper • 2602.01785 • Published • 93 -
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
Paper • 2601.17277 • Published • 6 -
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
Paper • 2602.08961 • Published • 3
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 18 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 42 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 50
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 52 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43