Poster Mon, Jul 6, 2026 • 6:30 PM – 8:15 PM PDT HALL A #800

SCoA: Revisiting Domain Generalized Object Detection with Style-Conditioned Adaptation

Han Jiang ⋅ Wenfei Yang ⋅ Tianzhu Zhang ⋅ Yongdong Zhang

Abstract

Domain generalized object detection (DGOD) aims to train an object detector on a single source domain and generalize it to unseen target domains. Recent advances in DGOD have increasingly exploited vision foundation models (VFMs) via parameter-efficient finetuning strategies. However, existing approaches typically adapt VFMs with fixed, style-agnostic parameters, overlooking that different visual styles may induce distinct task discrepancies. To address this challenge, we propose SCoA, a novel Style Conditioned Adaptation framework for dynamic, style-aware task compensation. Specifically, we introduce a Spectral Style Modeling (SSM) module that preserves local style cues via a memory-based mechanism, enabling diverse style characterization from a single source domain. Conditioned on the extracted style signals, we design a Mixture-of-Tokens Adaptation (MTA) mechanism, which maintains multiple adaptation tokens and dynamically routes each sample to an optimal combination of tokens, thereby explicitly modeling style-dependent task mismatches. In addition, we propose a Style-Conditioned Query Refinement (SCQR) module that injects style information into object queries, enabling a style-aware detection head. By jointly integrating these components, SCoA allows the model to follow style-specific adaptation trajectories, achieving effective and flexible task compensation for VFM-based DGOD. Extensive experiments demonstrate that the proposed SCoA achieves state-of-the-art performance across two challenging scenarios.

Lay Summary

Object detectors often perform poorly when deployed in new environments with different visual conditions, such as fog, rain, or nighttime scenes. Existing approaches improve robustness by adapting large vision foundation models, but they usually rely on fixed adaptation strategies that treat all visual styles in the same way. We wanted to investigate whether object detectors could instead adapt dynamically to different styles encountered in unseen domains. Our paper introduces SCoA, a style-conditioned adaptation framework that enables the detector to adjust its behavior according to the visual style of each input image. Specifically, our method learns diverse style patterns from a single training domain and uses them to guide both feature adaptation and object query refinement. This allows the model to follow different adaptation paths for different styles, leading to more flexible and effective generalization. Experiments on challenging benchmarks show that SCoA significantly improves detection performance under diverse weather conditions and image corruptions.