Skip to yearly menu bar Skip to main content


MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention

Wenxuan Li ⋅ Chengruidong Zhang ⋅ Huiqiang Jiang ⋅ Yucheng Li ⋅ Yuqing Yang ⋅ Lili Qiu

Abstract

Chat is not available.