Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

PASO: Step Parallel Stochastic Optimization

Jianrong Lu ⋅ Zhuoya Gu ⋅ Haobo Li ⋅ Zhiyu Zhu ⋅ Yechao Zhang ⋅ Jianhai Chen ⋅ Minghui Yang ⋅ Junwei Liu ⋅ Jian Wang ⋅ Qinming He ⋅ Hui LIU ⋅ Junhui Hou

Project Page

Abstract

This paper approaches the fundamental challenge of accelerating the inherently autoregressive nature of gradient descent (GD) like SGD and Adam through a dynamic system perspective. Specifically, we introduce a unified framework that recasts the autoregressive GD process as solving a system of triangular nonlinear equations (TNEs), thereby enabling \textit{step-parallel} training, where gradients for different GD steps are computed concurrently without sequential dependencies. Within this generic framework, we establish that: (1) the TNE system admits a unique solution corresponding precisely to the autoregressive GD iterative trajectory; (2) solving the TNEs system guarantees convergence to the GD iterative trajectory in at most the equal iterations. Building on these insights, we present \textit{PASO}, the first step-parallel optimizer for accelerating a broad class of GD-based optimizers like SGD and Adam. Extensive experiments (\textit{e.g.}, Llama-3.2-1B and diffusion model) validate that PASO achieves up to \textbf{21}$\times$ reduction in GD steps and \textbf{4.5}$\times$ speedup in wall-clock time, with no model quality loss. Source code is available at: \url{https://anonymous.4open.science/r/PASO-0AF9}.