Timezone: »
Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of interest. So, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks: these individual fine-tunings exist in isolation without benefiting from each other. In our opinion, this is a missed opportunity, as these specialized models contain rich and diverse features. In this paper, we thus propose model ratatouille, a new strategy to recycle the multiple fine-tunings of the same foundation model on diverse auxiliary tasks. Specifically, we repurpose these auxiliary weights as initializations for multiple parallel fine-tunings on the target task; then, we average all fine-tuned weights to obtain the final model. This recycling strategy aims at maximizing the diversity in weights by leveraging the diversity in auxiliary tasks. Empirically, it improves the state of the art on the reference DomainBed benchmark for out-of-distribution generalization. Looking forward, this work contributes to the emerging paradigm of updatable machine learning where, akin to open-source software development, the community collaborates to reliably update machine learning models.
Author Information
Alexandre Rame (Sorbonne University)
I am a PhD student at Sorbonne University in Paris under the supervision of Professor Matthieu Cord. My thesis aims at improving the robustness of deep learning networks by leveraging their diversity.
Kartik Ahuja (FAIR (Meta AI))
Jianyu Zhang (New York University)
Matthieu Cord (Sorbonne University)
Leon Bottou (Meta AI)
David Lopez-Paz (Facebook AI Research)
More from the Same Authors
-
2020 : On the Equivalence of Bi-Level Optimization and Game-Theoretic Formulations of Invariant Risk Minimization »
Kartik Ahuja -
2022 : A Bias-Variance Analysis of Weight Averaging for OOD Generalization »
Alexandre Ramé · Matthieu Kirchmeyer · Thibaud J Rahier · Alain Rakotomamonjy · Patrick Gallinari · Matthieu Cord -
2023 : Cross-Risk Minimization: Inferring Groups Information for Improved Generalization »
Mohammad Pezeshki · Diane Bouchacourt · Mark Ibrahim · Nicolas Ballas · Pascal Vincent · David Lopez-Paz -
2023 : Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection »
Vitória Barin-Pacela · Kartik Ahuja · Simon Lacoste-Julien · Pascal Vincent -
2023 : Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection »
Vitória Barin-Pacela · Kartik Ahuja · Simon Lacoste-Julien · Pascal Vincent -
2023 : A Closer Look at In-Context Learning under Distribution Shifts »
Kartik Ahuja · David Lopez-Paz -
2023 : Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards »
Alexandre Rame · Guillaume Couairon · Corentin Dancette · Jean-Baptiste Gaya · Mustafa Shukor · Laure Soulier · Matthieu Cord -
2023 : Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards »
Alexandre Rame · Guillaume Couairon · Corentin Dancette · Mustafa Shukor · Jean-Baptiste Gaya · Laure Soulier · Matthieu Cord -
2023 Poster: Learning useful representations for shifting tasks and distributions »
Jianyu Zhang · Leon Bottou -
2023 Oral: Interventional Causal Representation Learning »
Kartik Ahuja · Divyat Mahajan · Yixin Wang · Yoshua Bengio -
2023 Oral: Why does Throwing Away Data Improve Worst-Group Error? »
Kamalika Chaudhuri · Kartik Ahuja · Martin Arjovsky · David Lopez-Paz -
2023 Poster: Why does Throwing Away Data Improve Worst-Group Error? »
Kamalika Chaudhuri · Kartik Ahuja · Martin Arjovsky · David Lopez-Paz -
2023 Poster: Interventional Causal Representation Learning »
Kartik Ahuja · Divyat Mahajan · Yixin Wang · Yoshua Bengio -
2023 : Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection »
Vitória Barin-Pacela · Kartik Ahuja · Simon Lacoste-Julien · Pascal Vincent -
2022 : Discussion Panel »
Percy Liang · Léon Bottou · Jayashree Kalpathy-Cramer · Alex Smola -
2022 : Invited talks I, Q/A »
Bernhard Schölkopf · David Lopez-Paz -
2022 : Invited Talks 1, Bernhard Schölkopf and David Lopez-Paz »
Bernhard Schölkopf · David Lopez-Paz -
2022 Poster: Rich Feature Construction for the Optimization-Generalization Dilemma »
Jianyu Zhang · David Lopez-Paz · Léon Bottou -
2022 Spotlight: Rich Feature Construction for the Optimization-Generalization Dilemma »
Jianyu Zhang · David Lopez-Paz · Léon Bottou -
2022 Poster: Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization »
Alexandre Rame · Corentin Dancette · Matthieu Cord -
2022 Spotlight: Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization »
Alexandre Rame · Corentin Dancette · Matthieu Cord -
2021 Poster: Can Subnetwork Structure Be the Key to Out-of-Distribution Generalization? »
Dinghuai Zhang · Kartik Ahuja · Yilun Xu · Yisen Wang · Aaron Courville -
2021 Oral: Can Subnetwork Structure Be the Key to Out-of-Distribution Generalization? »
Dinghuai Zhang · Kartik Ahuja · Yilun Xu · Yisen Wang · Aaron Courville -
2020 Workshop: Workshop on Continual Learning »
Haytham Fayek · Arslan Chaudhry · David Lopez-Paz · Eugene Belilovsky · Jonathan Richard Schwarz · Marc Pickett · Rahaf Aljundi · Sayna Ebrahimi · Razvan Pascanu · Puneet Dokania -
2020 Poster: Invariant Risk Minimization Games »
Kartik Ahuja · Karthikeyan Shanmugam · Kush Varshney · Amit Dhurandhar -
2019 Poster: Manifold Mixup: Better Representations by Interpolating Hidden States »
Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio -
2019 Poster: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2019 Poster: AdaGrad stepsizes: sharp convergence over nonconvex landscapes »
Rachel Ward · Xiaoxia Wu · Leon Bottou -
2019 Oral: AdaGrad stepsizes: sharp convergence over nonconvex landscapes »
Rachel Ward · Xiaoxia Wu · Leon Bottou -
2019 Oral: Manifold Mixup: Better Representations by Interpolating Hidden States »
Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio -
2019 Oral: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2018 Poster: Optimizing the Latent Space of Generative Networks »
Piotr Bojanowski · Armand Joulin · David Lopez-Paz · Arthur Szlam -
2018 Oral: Optimizing the Latent Space of Generative Networks »
Piotr Bojanowski · Armand Joulin · David Lopez-Paz · Arthur Szlam -
2017 Poster: Wasserstein Generative Adversarial Networks »
Martin Arjovsky · Soumith Chintala · Léon Bottou -
2017 Talk: Wasserstein Generative Adversarial Networks »
Martin Arjovsky · Soumith Chintala · Léon Bottou