Multi-Label Test-Time Adaptation with Bayesian Conditional Priors
Qiru Li ⋅ Ao Zhou ⋅ Zhiwei Jiang ⋅ Zifeng Cheng ⋅ Cong Wang ⋅ Yafeng Yin ⋅ Qing Gu
Abstract
Vision--language models such as CLIP have shown strong zero-shot performance, but their reliability degrades in realistic multi-label settings under distribution shift. Standard test-time adaptation (TTA) methods either rely on costly gradient-based updates or adopt lightweight statistical schemes that implicitly assume label independence. The latter is particularly harmful in multi-label scenarios, where visually dominant classes suppress semantically correlated yet weaker labels, leading to severe recall loss. We revisit multi-label TTA from a Bayesian perspective and propose Bayesian Conditional Priors~(BCP) estimation, a backpropagation-free framework that injects label dependencies into CLIP's zero-shot predictions. Treating the zero-shot scores as approximate marginal posteriors, BCP derives an anchor-conditioned Bayesian refinement in which each logit is corrected by a term determined solely by the conditional prior $P(c_i=1 \mid c_a=1)$. These conditional priors are estimated online via second-order co-occurrence statistics over the test stream and instantiated as closed-form, monotonic logit transformations, without backpropagation or architectural changes. Experiments on multi-label benchmarks show that this structure-aware adaptation consistently improves mean average precision over entropy-based and retrieval-augmented TTA baselines, while incurring negligible computational overhead.
Successful Page Load