Learning to Memorize with Attributive and Associative Memory for Online Test-Time Adaptation of Vision-Language Models
Yuchao Zhang ⋅ Eric Wang ⋅ Fan Zhang ⋅ Haoxuan Li ⋅ Yisen Wang ⋅ Zhouchen Lin ⋅ Jun Wang ⋅ QIRUI MI ⋅ Mengyue Yang
Abstract
Memory-based test-time adaptation (TTA) assigns streaming test samples into class-specific memory slots based on pseudo-labels predicted by models like CLIP, and retrieves them to facilitate subsequent predictions under distribution shift. However, this process introduces two challenges: ❶ **Each sample is hard-assigned to a single class based on CLIP's prediction**, where inaccurate CLIP prediction leads to memory contamination that biases subsequent prediction. ❷ **Samples are evicted under biased selection due to fixed memory capacity**, which risks discarding informative samples and undermining the efficacy of the memory. To address these challenges, we propose **A$^{2}$Memory** (**A**ttributive-**A**ssociative **Memory** for Test-time Adaptation). For challenge ❶ , we propose *Attribute-centric Memory Construction* that builds prior textual representations from class-shared representative and diverse visual attributes, and applies soft assignment to generate surrogate visual representations. For challenge ❷, we design *Class-wise Associative Memory* that dynamically compresses streaming samples into fixed-capacity memory through gradient-based optimization and data-dependent retention, then retrieves sample-adaptive class prototypes for reliable inference. Extensive experiments demonstrate consistent improvements over state-of-the-art methods across 15 benchmarks.
Successful Page Load