Skip to yearly menu bar Skip to main content


Poster

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Philip Zmushko ⋅ Egor Petrov ⋅ Nursultan Abdullaev ⋅ Khrushchev Mikhail ⋅ Samuel Horváth

Abstract

Log in and register to view live content