arxiv:2511.18871

Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

Published on Apr 28

Authors:

Abstract

A periodically asynchronous reinforcement learning framework separates inference and training deployment to improve training throughput on NPU platforms while maintaining accuracy and generalizing to GPU architectures.

AI-generated summary

Since the introduction of the GRPO algorithm, reinforcement learning~(RL) has attracted increasing attention for LLM post-training, yet training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training are co-located on the same devices, and their synchronous execution prevents concurrent inference and training. In this work, we revisit the strategy of separating inference and training deployment, and propose a periodically asynchronous framework that transforms synchronous RL training into an asynchronous producer--consumer pipeline. By synchronising model weights at the beginning of each training iteration and generating all rollouts from the same policy, the proposed framework remains inherently on-policy, avoiding the off-policy bias introduced by existing asynchronous approaches without any modification to standard RL algorithms. We further introduce a unified tri-model architecture and a shared-prompt attention mechanism to support efficient asynchronous execution and reduce redundant computation. Experiments on NPU platforms show that the proposed framework achieves around 2times throughput improvement from asynchronous execution, with additional gains from system-level optimisations, substantially outperforming mainstream RL frameworks in end-to-end training throughput while maintaining comparable accuracy. Further validation on GPU platforms confirms that the proposed framework generalises effectively across hardware architectures, indicating its potential for widespread application.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2511.18871

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.18871 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.18871 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.18871 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.