Skip to yearly menu bar Skip to main content


Poster

Rethinking Convergence in MoE Training: The Role of Routing Sparsity

Weihao Zhu ⋅ Long Shi ⋅ Kang Wei ⋅ Zhe Wang ⋅ Yipeng Zhou ⋅ Haixia Zhang

Abstract

Log in and register to view live content