Skip to yearly menu bar Skip to main content


MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention

Wenxuan Li · Chengruidong Zhang · Huiqiang Jiang · Yucheng Li · Yuqing Yang · Lili Qiu

Abstract

Chat is not available.