DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published 13 days ago • 61
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning Paper • 2602.10622 • Published Feb 11 • 28