Timezone: »
Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method -- fine-tuning all parameters of the source model to the target domain -- possibly because fine-tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded by the later previously trained layers. We explore the hypothesis that these intermediate layers might be directly exploited. We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. In evaluations on the Visual Task Adaptation Benchmark-1k, Head2Toe matches performance obtained with fine-tuning on average while reducing training and storage cost hundred folds or more, but critically, for out-of-distribution transfer, Head2Toe outperforms fine-tuning. Code used in our experiments can be found in supplementary materials.
Author Information
Utku Evci (Google)
Vincent Dumoulin (Google)
Hugo Larochelle (Google Brain)
Michael Mozer (Google Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning »
Tue. Jul 19th 03:05 -- 03:25 PM Room Room 318 - 320
More from the Same Authors
-
2022 : Learning to induce causal structure »
Rosemary Nan Ke · Silvia Chiappa · Jane Wang · Jorg Bornschein · Anirudh Goyal · Melanie Rey · Matthew Botvinick · Theophane Weber · Michael Mozer · Danilo J. Rezende -
2023 Poster: Can Neural Network Memorization Be Localized? »
Pratyush Maini · Michael Mozer · Hanie Sedghi · Zachary Lipton · Zico Kolter · Chiyuan Zhang -
2023 Poster: Repository-Level Prompt Generation for Large Language Models of Code »
Disha Shrivastava · Hugo Larochelle · Daniel Tarlow -
2023 Poster: Discrete Key-Value Bottleneck »
Frederik Träuble · Anirudh Goyal · Nasim Rahaman · Michael Mozer · Kenji Kawaguchi · Yoshua Bengio · Bernhard Schölkopf -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 Poster: The State of Sparse Training in Deep Reinforcement Learning »
Laura Graesser · Utku Evci · Erich Elsen · Pablo Samuel Castro -
2022 Spotlight: The State of Sparse Training in Deep Reinforcement Learning »
Laura Graesser · Utku Evci · Erich Elsen · Pablo Samuel Castro -
2021 : Invited Talk #2 »
Hugo Larochelle -
2021 Poster: Learning a Universal Template for Few-shot Dataset Generalization »
Eleni Triantafillou · Hugo Larochelle · Richard Zemel · Vincent Dumoulin -
2021 Spotlight: Learning a Universal Template for Few-shot Dataset Generalization »
Eleni Triantafillou · Hugo Larochelle · Richard Zemel · Vincent Dumoulin -
2020 Poster: Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules »
Sarthak Mittal · Alex Lamb · Anirudh Goyal · Vikram Voleti · Murray Shanahan · Guillaume Lajoie · Michael Mozer · Yoshua Bengio -
2020 Poster: Revisiting Fundamentals of Experience Replay »
William Fedus · Prajit Ramachandran · Rishabh Agarwal · Yoshua Bengio · Hugo Larochelle · Mark Rowland · Will Dabney -
2020 Poster: Small-GAN: Speeding up GAN Training using Core-Sets »
Samrath Sinha · Han Zhang · Anirudh Goyal · Yoshua Bengio · Hugo Larochelle · Augustus Odena