Spotlight
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu ⋅ Jiaxuan Gao ⋅ Shusheng Xu ⋅ Zhiyu Mei ⋅ Chen Zhu ⋅ Xujie Shen ⋅ Chuyi He ⋅ Guo Wei ⋅ Jun Mei ⋅ Jiashu Wang ⋅ Tongkai Yang ⋅ Binhang Yuan ⋅ Yi Wu

Project Page [ OpenReview]

Abstract

Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where the rollouts in each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency, as generation must wait until the longest output in the batch is completed before model update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.57× training speedup compared to the best synchronous systems with the same number of GPUs and matched or even improved final performance.

Chat is not available.