Poster
in
Workshop: ICML 2024 Workshop on Foundation Models in the Wild
Two-Level Test-Time Adaptation in Multimodal Learning
Jixiang Lei · Franz Pernkopf
Keywords: [ Multimodal Learning ] [ test-time adaptation ] [ Fine-Tuning ]
Test-time adaptation (TTA) aims to modulate parameters of the pre-trained source model utilizing samples from the target domain without accessing the source data. Although recent studies have revealed the high potential of TTAin different computer vision tasks, most TTA methods are constrained to the uni-modal adaptation tasks, while the reliability bias caused by uni-modal data corruption is not sufficiently discussed in multimodal tasks. Although some most recent methods suppressed the cross-modal information discrepancy (i.e. reliability bias) via modulating a modality-sharing module, the domain adaptation for the modality-specific module was neglected. In this paper, we propose a two-level test-time adaptation method (namely 2LTTA) considering both intra-modal distribution shift and cross-modal reliability bias in multimodal learning. 2LTTA modulates all normalization layers and self-attention modules of the encoder corresponding to the corrupted modality and the modality-sharing block. Additionally, we adopted a two-level objective function considering both intra-modal distribution shift and cross-modal reliability bias in the modality fusion block. Shannon entropy with sample reweighting was utilized to reduce the intra-modal distribution shift caused by data corruption. A diversity-promoting loss was employed to reduce the cross-modal information discrepancy. Our experiments demonstrate the superiority of 2LTTA over baseline methods on various data sets.