article collection
updated
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
•
2509.22576
•
Published
•
135
AgentBench: Evaluating LLMs as Agents
Paper
•
2308.03688
•
Published
•
25
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
•
1910.01108
•
Published
•
21
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
64
AWQ: Activation-aware Weight Quantization for LLM Compression and
Acceleration
Paper
•
2306.00978
•
Published
•
11
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
Transformers
Paper
•
2210.17323
•
Published
•
10
TPO: Aligning Large Language Models with Multi-branch & Multi-step
Preference Trees
Paper
•
2410.12854
•
Published
•
1
KTO: Model Alignment as Prospect Theoretic Optimization
Paper
•
2402.01306
•
Published
•
21
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
24