DNA: Domain Generalization with Diversified Neural Averaging

Xu Chu · Yujie Jin · Wenwu Zhu · Yasha Wang · Xin Wang · Shanghang Zhang · Hong Mei

Ballroom 3 & 4
[ Abstract ] [ Livestream: Visit Probabilistic Methods/Applications ]
Tue 19 Jul 8:30 a.m. — 8:35 a.m. PDT
[ Slides [ Paper PDF

The inaccessibility of the target domain data causes domain generalization (DG) methods prone to forget target discriminative features, and challenges the pervasive theme in existing literature in pursuing a single classifier with an ideal joint risk. In contrast, this paper investigates model misspecification and attempts to bridge DG with classifier ensemble theoretically and methodologically. By introducing a pruned Jensen-Shannon (PJS) loss, we show that the target square-root risk w.r.t. the PJS loss of the $\rho$-ensemble (the averaged classifier weighted by a quasi-posterior $\rho$) is bounded by the averaged source square-root risk of the Gibbs classifiers. We derive a tighter bound by enforcing a positive principled diversity measure of the classifiers. We give a PAC-Bayes upper bound on the target square-root risk of the $\rho$-ensemble. Methodologically, we propose a diversified neural averaging (DNA) method for DG, which optimizes the proposed PAC-Bayes bound approximately. The DNA method samples Gibbs classifiers transversely and longitudinally by simultaneously considering the dropout variational family and optimization trajectory. The $\rho$-ensemble is approximated by averaging the longitudinal weights in a single run with dropout shut down, ensuring a fast ensemble with low computational overhead. Empirically, the proposed DNA method achieves the state-of-the-art classification performance on standard DG benchmark datasets.

Chat is not available.