kaizuberbuehler 's Collections Synthetic Data and Self-Improvement
updated
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper
• 2412.21139
• Published
• 25
Evaluating Language Models as Synthetic Data Generators
Paper
• 2412.03679
• Published
• 47
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
• 2402.03620
• Published
• 117
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
• 2402.07456
• Published
• 46
Learning From Mistakes Makes LLM Better Reasoner
Paper
• 2310.20689
• Published
• 29
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper
• 2404.07503
• Published
• 31
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published
• 62
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
with a Self-Critique Pipeline
Paper
• 2404.02893
• Published
• 21
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper
• 2305.16291
• Published
• 13
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper
• 2303.11366
• Published
• 5
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of
Diverse Models
Paper
• 2404.18796
• Published
• 71
Extending Llama-3's Context Ten-Fold Overnight
Paper
• 2404.19553
• Published
• 34
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
• 2405.12399
• Published
• 30
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
• 2405.20541
• Published
• 24
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
• 2406.04325
• Published
• 74
Paper
• 2406.09414
• Published
• 103
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published
• 71
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper
• 2406.20094
• Published
• 104
Diffusion Augmented Agents: A Framework for Efficient Exploration and
Transfer Learning
Paper
• 2407.20798
• Published
• 24
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
• 2408.06195
• Published
• 73
Data curation via joint example selection further accelerates multimodal
learning
Paper
• 2406.17711
• Published
• 3
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
Thinking LLMs: General Instruction Following with Thought Generation
Paper
• 2410.10630
• Published
• 20
How to Synthesize Text Data without Model Collapse?
Paper
• 2412.14689
• Published
• 53
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
• 2412.08737
• Published
• 54
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
• 2412.19723
• Published
• 87
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
• 2501.01264
• Published
• 26
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
• 2412.17256
• Published
• 47
Diving into Self-Evolving Training for Multimodal Reasoning
Paper
• 2412.17451
• Published
• 42
Paper
• 2412.16720
• Published
• 37
ResearchTown: Simulator of Human Research Community
Paper
• 2412.17767
• Published
• 14
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
• 2412.11605
• Published
• 18
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web
Tutorials
Paper
• 2412.09605
• Published
• 30
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree
Search for Code Generation
Paper
• 2411.11053
• Published
• 4
CodeDPO: Aligning Code Models with Self Generated and Verified Source
Code
Paper
• 2410.05605
• Published
• 1
Enhancing LLM Reasoning via Critique Models with Test-Time and
Training-Time Supervision
Paper
• 2411.16579
• Published
• 3
Vision-Language Models Can Self-Improve Reasoning via Reflection
Paper
• 2411.00855
• Published
• 5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published
• 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
• 2411.14432
• Published
• 25
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
• 2411.18203
• Published
• 40
From Generation to Judgment: Opportunities and Challenges of
LLM-as-a-judge
Paper
• 2411.16594
• Published
• 39
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
• 2410.02884
• Published
• 54
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning
for Web Agents
Paper
• 2411.06559
• Published
• 16
Generative World Explorer
Paper
• 2411.11844
• Published
• 77
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
• 2411.08147
• Published
• 65
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Paper
• 2411.07133
• Published
• 38
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Paper
• 2411.07461
• Published
• 23
Self-Consistency Preference Optimization
Paper
• 2411.04109
• Published
• 19
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published
• 99
Enabling Scalable Oversight via Self-Evolving Critic
Paper
• 2501.05727
• Published
• 72
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
• 2501.05707
• Published
• 20
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published
• 109
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in
Realistic Environments
Paper
• 2501.10893
• Published
• 26
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI
Systems
Paper
• 2501.11067
• Published
• 13
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model
Critiques
Paper
• 2501.14492
• Published
• 27
Baichuan-Omni-1.5 Technical Report
Paper
• 2501.15368
• Published
• 60
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published
• 59
Atla Selene Mini: A General Purpose Evaluation Model
Paper
• 2501.17195
• Published
• 35
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
• 2501.18511
• Published
• 20
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
• 2502.01534
• Published
• 40
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Paper
• 2502.13124
• Published
• 7
On Teacher Hacking in Language Model Distillation
Paper
• 2502.02671
• Published
• 18
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus
Expansion
Paper
• 2502.04235
• Published
• 23
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM
Guardrails
Paper
• 2502.05163
• Published
• 22
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data
Annotators
Paper
• 2502.06394
• Published
• 89
Teaching Language Models to Critique via Reinforcement Learning
Paper
• 2502.03492
• Published
• 24
Distillation Scaling Laws
Paper
• 2502.08606
• Published
• 47
mmE5: Improving Multimodal Multilingual Embeddings via High-quality
Synthetic Data
Paper
• 2502.08468
• Published
• 16
CLIPPER: Compression enables long-context synthetic data generation
Paper
• 2502.14854
• Published
• 11
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
• 2502.19361
• Published
• 28
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning
Trajectories for Complex Problem Solving
Paper
• 2502.16111
• Published
• 9
VEM: Environment-Free Exploration for Training GUI Agent with Value
Environment Model
Paper
• 2502.18906
• Published
• 12
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
• 2503.00808
• Published
• 56
Process-based Self-Rewarding Language Models
Paper
• 2503.03746
• Published
• 39
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for
Coding
Paper
• 2503.02951
• Published
• 33
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper
• 2503.00735
• Published
• 23
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
• 2503.01307
• Published
• 38
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous
Manipulation on Humanoids
Paper
• 2502.20396
• Published
• 15
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Self-Taught Self-Correction for Small Language Models
Paper
• 2503.08681
• Published
• 15
MathFusion: Enhancing Mathematic Problem-solving of LLM through
Instruction Fusion
Paper
• 2503.16212
• Published
• 25
MetaLadder: Ascending Mathematical Solution Quality via
Analogical-Problem Reasoning Transfer
Paper
• 2503.14891
• Published
• 22
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
• 2503.14495
• Published
• 11
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
• 2503.17352
• Published
• 24
Judge Anything: MLLM as a Judge Across Any Modality
Paper
• 2503.17489
• Published
• 23
JudgeLRM: Large Reasoning Models as a Judge
Paper
• 2504.00050
• Published
• 62
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Paper
• 2503.22655
• Published
• 39
WikiVideo: Article Generation from Multiple Videos
Paper
• 2504.00939
• Published
• 37
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
• 2503.24388
• Published
• 29
YourBench: Easy Custom Evaluation Sets for Everyone
Paper
• 2504.01833
• Published
• 22
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Paper
• 2504.01943
• Published
• 15
GenPRM: Scaling Test-Time Compute of Process Reward Models via
Generative Reasoning
Paper
• 2504.00891
• Published
• 14
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Paper
• 2503.23573
• Published
• 12
ActionStudio: A Lightweight Framework for Data and Training of Large
Action Models
Paper
• 2503.22673
• Published
• 12
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
• 2504.04718
• Published
• 43
MM-IFEngine: Towards Multimodal Instruction Following
Paper
• 2504.07957
• Published
• 35
MegaMath: Pushing the Limits of Open Math Corpora
Paper
• 2504.02807
• Published
• 35
Agentic Knowledgeable Self-awareness
Paper
• 2504.03553
• Published
• 27
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual
Reasoning Self-Improvement
Paper
• 2504.07934
• Published
• 21
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
• 2504.03601
• Published
• 17
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing
Skills
Paper
• 2504.07079
• Published
• 12
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published
• 93
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
Heimdall: test-time scaling on the generative verification
Paper
• 2504.10337
• Published
• 33
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in
Data Synthesis
Paper
• 2504.12322
• Published
• 28
Efficient Process Reward Model Training via Active Learning
Paper
• 2504.10559
• Published
• 13
MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic
Data Generation
Paper
• 2504.12563
• Published
• 4
MIG: Automatic Data Selection for Instruction Tuning by Maximizing
Information Gain in Semantic Space
Paper
• 2504.13835
• Published
• 38
Process Reward Models That Think
Paper
• 2504.16828
• Published
• 18