ICML One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery

One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

Michal Golovanevsky · Eva Schiller · Akira Nair · Ritambhara Singh · Carsten Eickhoff

Keywords: [ Multimodal Learning ] [ clinical decision support ] [ Deep Learning ] [ Biomedical Data ] [ scalability ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Multimodal models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on vision-language applications, where the number of modalities rarely exceeds four (images, text, audio, video). However, data in other domains, such as healthcare, may include many more modalities like X-rays, PET scans, MRIs, genetic screening, genomic data, and clinical notes, creating a need for both efficient and accurate data integration. Many multimodal foundation models rely on cross-attention or self-attention for effective data integration, which do not scale well for applications with more than two modalities. The complexity per layer of computing attention in either paradigm is, at best, quadratic with respect to the number of modalities, posing a computational bottleneck that impedes broad adoption. To address this, we propose a new attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities, thus offering a significant reduction in computational complexity compared to existing multimodal attention methods. Using three biomedical datasets with diverse modalities, we show that our method decreases computation costs while increasing performance compared to popular integration techniques. Across all datasets, OvO reduced the number of required floating point operations (FLOPs) by at least 91.98\%, demonstrating its significant impact on efficiency and enabling wider adaptation.

Chat is not available.

Spotlight in Workshop: Accessible and Efficient Foundation Models for Biological Discovery

One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

Michal Golovanevsky · Eva Schiller · Akira Nair · Ritambhara Singh · Carsten Eickhoff

Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery