Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models

OutEffHop: A Principled Outlier-Efficient Attention Layer from Dense Associative Memory Models

Haozheng Luo · Jerry Yao-Chieh Hu · Pei-Hsuan Chang · Hong-Yu Chen · Weijian Li · Wei-Po Wang · Han Liu


Abstract:

We introduce a principled approach to Outlier-Efficient Attention Layers via associative memory models to reduce outlier emergence in large transformer-based model. Our main contribution is a novel associative memory model that facilitates outlier-efficient associative memory retrievals. This model subsumes the outlier-efficient attention mechanism (Softmax_1) as a special case of its memory retrieval process. Methodologically, this enables the introduction of novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, offering superior post-quantization performance. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models, including BERT, OPT, ViT, and STanHop-Net, benchmarking against state-of-the-art methods like Clipped_Softmax and Gated_Attention. Notably, our method achieves an average reduction of over 22\% in average kurtosis and over 26\% in the maximum infinity norm of model outputs across the four models, without sacrificing model performance after quantization.

Chat is not available.