Timezone: »
This paper prescribes a distance between learning tasks modeled as joint distributions on data and labels. Using tools in information geometry, the distance is defined to be the length of the shortest weight trajectory on a Riemannian manifold as a classifier is fitted on an interpolated task. The interpolated task evolves from the source to the target task using an optimal transport formulation. This distance, which we call the "coupled transfer distance" can be compared across different classifier architectures. We develop an algorithm to compute the distance which iteratively transports the marginal on the data of the source task to that of the target task while updating the weights of the classifier to track this evolving data distribution. We develop theory to show that our distance captures the intuitive idea that a good transfer trajectory is the one that keeps the generalization gap small during transfer, in particular at the end on the target task. We perform thorough empirical validation and analysis across diverse image classification datasets to show that the coupled transfer distance correlates strongly with the difficulty of fine-tuning.
Author Information
Yansong Gao (University of Pennsylvania)
Pratik Chaudhari (University of Pennsylvania)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: An Information-Geometric Distance on the Space of Tasks »
Thu. Jul 22nd 01:25 -- 01:30 AM Room
More from the Same Authors
-
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2023 Poster: The Value of Out-of-Distribution Data »
Ashwin De Silva · Rahul Ramesh · Carey Priebe · Pratik Chaudhari · Joshua Vogelstein -
2023 Poster: A Picture of the Space of Typical Learnable Tasks »
Rahul Ramesh · Jialin Mao · Itay Griniasty · Rubing Yang · Han Kheng Teoh · Mark Transtrum · James Sethna · Pratik Chaudhari -
2023 Workshop: New Frontiers in Learning, Control, and Dynamical Systems »
Valentin De Bortoli · Charlotte Bunne · Guan-Horng Liu · Tianrong Chen · Maxim Raginsky · Pratik Chaudhari · Melanie Zeilinger · Animashree Anandkumar -
2022 Poster: Does the Data Induce Capacity Control in Deep Learning? »
Rubing Yang · Jialin Mao · Pratik Chaudhari -
2022 Spotlight: Does the Data Induce Capacity Control in Deep Learning? »
Rubing Yang · Jialin Mao · Pratik Chaudhari -
2022 Poster: Deep Reference Priors: What is the best way to pretrain a model? »
Yansong Gao · Rahul Ramesh · Pratik Chaudhari -
2022 Spotlight: Deep Reference Priors: What is the best way to pretrain a model? »
Yansong Gao · Rahul Ramesh · Pratik Chaudhari -
2020 Poster: A Free-Energy Principle for Representation Learning »
Yansong Gao · Pratik Chaudhari