TiME: Test-Time Mixture-of-Experts Routing via Asymmetric CO-Optimal Transport for Continual Test-Time Adaptation
Abstract
Large language models usually face continuous domain shifts during testing, which degrade performance on unseen shifting domains. So, researchers propose continual test-time adaptation (CTTA) to adapt to evolving testing domains while preserving knowledge of previous domains, making adaptability-stability (A-S) balance. Existing CTTA methods are constrained by dense base models that encode knowledge from all domains into a global model, hardly achieving the A-S balance. We observe that the model sparsity of mixture-of-experts (MoE) models is better for achieving A–S balance than dense models. In CTTA, however, MoE faces difficulty in (1) correctly routing samples from unseen shifting domains and (2) capturing domain-level shifts. In this paper, we propose test-time mixture-of-experts routing (TiME) via asymmetric co-optimal transport (As-COOT): we model MoE routing in CTTA as a test-time allocation problem via COOT. To ensure reliable routing, we propose a semantic space alignment to align sample-expert distributions via bidirectional contrastive learning. To address COOT’s limitations in CTTA, we propose As-COOT, relaxing sample-side constraints while enforcing expert-side constraints to ensure noise robustness and balance expert load. Experiments show TiME outperforms baselines. Code is: anonymous.4open.science/r/As-COOT-78FF