Skip to yearly menu bar Skip to main content


Poster

Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers

Anrui Chen ⋅ Ruijun Huang ⋅ Xin Zhang ⋅ Fang DONG(董方) ⋅ Hengjie Cao ⋅ Zhendong Huang ⋅ Yifeng Yang ⋅ Mengyi Chen ⋅ Jixian Zhou ⋅ Mingzhi Dong ⋅ Yujiang Wang ⋅ Jinlong Hou ⋅ Qin Lv ⋅ Robert Dick ⋅ Yuan Cheng ⋅ Tun Lu ⋅ Fan Yang ⋅ Li Shang

Abstract

Log in and register to view live content