Timezone: »
Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.
Author Information
Mitchell Wortsman (University of Washington)
Maxwell Horton (Apple)
Carlos Guestrin (Stanford University & Apple)
Ali Farhadi (University of Washington, Allen Institue for AI)
Mohammad Rastegari (University of Washington)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Learning Neural Network Subspaces »
Wed. Jul 21st 04:00 -- 06:00 AM Room
More from the Same Authors
-
2022 : How well do contrastively trained models transfer? »
M. Moein Shariatnia · Rahim Entezari · Mitchell Wortsman · Olga Saukh · Ludwig Schmidt -
2022 Poster: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Rebecca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Spotlight: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Rebecca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Poster: Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) »
Alex Fang · Gabriel Ilharco · Mitchell Wortsman · Yuhao Wan · Vaishaal Shankar · Achal Dave · Ludwig Schmidt -
2022 Spotlight: Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) »
Alex Fang · Gabriel Ilharco · Mitchell Wortsman · Yuhao Wan · Vaishaal Shankar · Achal Dave · Ludwig Schmidt -
2020 Poster: Soft Threshold Weight Reparameterization for Learnable Sparsity »
Aditya Kusupati · Vivek Ramanujan · Raghav Somani · Mitchell Wortsman · Prateek Jain · Sham Kakade · Ali Farhadi -
2020 Poster: AdaScale SGD: A User-Friendly Algorithm for Distributed Training »
Tyler Johnson · Pulkit Agrawal · Haijie Gu · Carlos Guestrin -
2019 Poster: Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment »
Chen Huang · Shuangfei Zhai · Walter Talbott · Miguel Angel Bautista Martin · Shih-Yu Sun · Carlos Guestrin · Joshua M Susskind -
2019 Oral: Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment »
Chen Huang · Shuangfei Zhai · Walter Talbott · Miguel Angel Bautista Martin · Shih-Yu Sun · Carlos Guestrin · Joshua M Susskind -
2017 Poster: StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent »
Tyler Johnson · Carlos Guestrin -
2017 Talk: StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent »
Tyler Johnson · Carlos Guestrin