SCoA: Revisiting Domain Generalized Object Detection with Style-Conditioned Adaptation
Abstract
Domain generalized object detection (DGOD) aims to train an object detector on a single source domain and generalize it to unseen target domains. Recent advances in DGOD have increasingly exploited vision foundation models (VFMs) via parameter-efficient finetuning strategies. However, existing approaches typically adapt VFMs with fixed, style-agnostic parameters, overlooking that different visual styles may induce distinct task discrepancies. To address this challenge, we propose SCoA, a novel Style Conditioned Adaptation framework for dynamic, style-aware task compensation. Specifically, we introduce a Spectral Style Modeling (SSM) module that preserves local style cues via a memory-based mechanism, enabling diverse style characterization from a single source domain. Conditioned on the extracted style signals, we design a Mixture-of-Tokens Adaptation (MTA) mechanism, which maintains multiple adaptation tokens and dynamically routes each sample to an optimal combination of tokens, thereby explicitly modeling style-dependent task mismatches. In addition, we propose a Style-Conditioned Query Refinement (SCQR) module that injects style information into object queries, enabling a style-aware detection head. By jointly integrating these components, SCoA allows the model to follow style-specific adaptation trajectories, achieving effective and flexible task compensation for VFM-based DGOD. Extensive experiments demonstrate that the proposed SCoA achieves state-of-the-art performance across two challenging scenarios.