Gradient-Aware Scheduling: Coupling Curriculum and Staleness for Async Reinforcement Learning
Xinyu Zhang
Abstract
Asynchronous reinforcement learning enables high-throughput training but introduces *policy lag*, where experiences are collected under stale policy weights. We identify a key phenomenon in code generation: **gradient variance scales exponentially with task difficulty under staleness**, because hard tasks have narrow solution spaces corresponding to sharp loss landscape curvature (high Hessian eigenvalues). We formalize this as a *staleness budget optimization problem* and prove that the optimal allocation follows an exponential decay: $\eta^*(d) = \eta_{base} \cdot e^{-\lambda d}$ where $\lambda = \alpha/2$ is half the Hessian growth rate. Building on this theory, we propose ACEAS (**A**daptive **C**urriculum with **E**xecution-**A**ware **A**sync **S**cheduling), combining bandit-based curriculum selection, execution-aware staleness budgets, and curriculum-staleness coupling derived from first principles. Our mechanistic analysis validates the theoretical predictions: the "safe zone" of gradient coherence follows the derived exponential boundary. On code generation benchmarks, ACEAS achieves over 2$\times$ higher throughput than synchronous baselines while improving Pass@1 from 39.7% to 60.1%, demonstrating that principled staleness control grounded in loss landscape geometry enables efficient asynchronous curriculum learning.
Successful Page Load