Skip to yearly menu bar Skip to main content


Poster

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Max Ryabinin · Tim Dettmers · Michael Diskin · Alexander Borzunov
2023 Poster
[ PDF [ Poster

Abstract

Video

Chat is not available.