Timezone: »
Machine learning models are frequently deployed in settings where the data distributions are shifting gradually over time. However, existing methods for adapting to distribution shifts fail to properly leverage this implicit structure. We propose a simple method to continually adapt a model to distribution shift by using the exponential moving average of model weights over discrete timesteps. We refer to our method as Continually and Stochastically Averaging Weights (CSAW). We show that CSAW achieves state-of-the-art performance on the Wild-Time benchmark of in-the-wild gradual temporal distribution shifts on a variety of datasets across vision, language and medical domains with improvements in both average and worst case OOD performance: +2.23% accuracy on Yearbook, +2.96% on FMoW, +0.87% on HuffPost, +1.43% on ArXiv, and +0.75% ROC-AUC on MIMIC-Mortality. We analyze the loss landscapes of sequentially fine-tuned models and show that they exhibit favorable mode connectivity properties which allows for weight averaging.
Author Information
Jared Fernandez (Carnegie Mellon University)
Saujas Vaduguru (Carnegie Mellon University)
Sanket Vaibhav Mehta (Carnegie Mellon University)
Yonatan Bisk (Carnegie Mellon University)
Emma Strubell (Carnegie Mellon University)
More from the Same Authors
-
2023 : Making Scalable Meta Learning Practical »
Sang Keun Choe · Sanket Vaibhav Mehta · Hwijeen Ahn · Willie Neiswanger · Pengtao Xie · Emma Strubell · Eric Xing -
2023 : The Framework Tax: Disparities Between Inference Efficiency in Research and Deployment »
Jared Fernandez · Jacob Kahn · Clara Na · Yonatan Bisk · Emma Strubell -
2023 : Dissecting Efficient Architectures for Wake-Word Detection »
Cody Berger · Juncheng Li · Yiyuan Li · Aaron Berger · Dmitri Berger · Karthik Ganesan · Emma Strubell · Florian Metze -
2023 : Conditional Diffusion Replay for Continual Learning in Medical Settings »
Yewon Byun · Saurabh Garg · Sanket Vaibhav Mehta · Praveer Singh · Jayashree Kalpathy-cramer · Bryan Wilder · Zachary Lipton -
2023 : Prompt-based Generative Replay: A Text-to-Image Approach for Continual Learning in Medical Settings »
Yewon Byun · Saurabh Garg · Sanket Vaibhav Mehta · Jayashree Kalpathy-Cramer · Praveer Singh · Bryan Wilder · Zachary Lipton -
2022 Poster: A Framework for Learning to Request Rich and Contextually Useful Information from Humans »
Khanh Nguyen · Yonatan Bisk · Hal Daumé III -
2022 Spotlight: A Framework for Learning to Request Rich and Contextually Useful Information from Humans »
Khanh Nguyen · Yonatan Bisk · Hal Daumé III -
2022 Poster: Symmetric Machine Theory of Mind »
Melanie Sclar · Graham Neubig · Yonatan Bisk -
2022 Spotlight: Symmetric Machine Theory of Mind »
Melanie Sclar · Graham Neubig · Yonatan Bisk -
2021 : Oral3 »
Sanket Vaibhav Mehta -
2021 Poster: Few-shot Language Coordination by Modeling Theory of Mind »
Hao Zhu · Graham Neubig · Yonatan Bisk -
2021 Spotlight: Few-shot Language Coordination by Modeling Theory of Mind »
Hao Zhu · Graham Neubig · Yonatan Bisk