Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models
Fast Adaptation and Robust Quantization of Multi-Modal Foundation Models from Associative Memory: A Case Study in SpeechLM
Shang Wu · Yen-Ju Lu · Haozheng Luo · Jerry Yao-Chieh Hu · Jiayi Wang · Jing Liu · Najim Dehak · Jesus Villalba · Han Liu
We present a preliminary investigation into the outlier problem within the multi-modal foundation model with a focus on SpeechLM. Specifically, we consider SpeechLM models that employ a pretrained LM as the backbone and are fine-tuned on multi-modal data (speech and text). There is an outlier problem in pretrained LLMs and the multi-modal inputs in SpeechLM. By adopting a principled approach inspired by associative memory models to address the outlier problem, we achieve significant improvements in the following: Faster low-rank adaptation, More accurate cross-modal fine-tuning, More robust post-training quantization. Methodologically, we implement an outlier-efficient Hopfield layer to replace the conventional transformer attention mechanism. This adjustment effectively removes outliers, leading to the improvement of the performance in multi-modal adaption and inference with quantized model. As a result, our proposed framework yields an average performance improvement of 7.98\% in cross-modal fine-tuning and 67.85\% in quantization, significantly outperforming standard frameworks in these respects.