Session
Transfer and Multi-Task Learning 1
Transfer Learning via Learning to Transfer
Ying WEI · Yu Zhang · Junzhou Huang · Qiang Yang
In transfer learning, what and how to transfer are two primary issues to be addressed, as different transfer learning algorithms applied between a source and a target domain result in different knowledge transferred and thereby the performance improvement in the target domain. Determining the optimal one that maximizes the performance improvement requires either exhaustive exploration or considerable expertise. Meanwhile, it is widely accepted in educational psychology that human beings improve transfer learning skills of deciding what to transfer through meta-cognitive reflection on inductive transfer learning practices. Motivated by this, we propose a novel transfer learning framework known as Learning to Transfer (L2T) to automatically determine what and how to transfer are the best by leveraging previous transfer learning experiences. We establish the L2T framework in two stages: 1) we learn a reflection function encrypting transfer learning skills from experiences; and 2) we infer what and how to transfer are the best for a future pair of domains by optimizing the reflection function. We also theoretically analyse the algorithmic stability and generalization bound of L2T, and empirically demonstrate its superiority over several state-of-the-art transfer learning algorithms.
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Ron Amit · Ron Meir
In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are `related’ to previous tasks, representations should be learned in such a way that they capture the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of a new task. We present a framework for meta-learning that is based on generalization error bounds, allowing us to extend various PAC-Bayes bounds to meta-learning. Learning takes place through the construction of a distribution over hypotheses based on the observed tasks, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks. We develop a gradient-based algorithm, and implement it for deep neural networks, based on minimizing an objective function derived from the bounds, and demonstrate its effectiveness numerically. In addition to establishing the improved performance available through meta-learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network.
Bilevel Programming for Hyperparameter Optimization and Meta-Learning
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil
We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner.We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn.
Pseudo-task Augmentation: From Deep Multitask Learning to Intratask Sharing—and Back
Elliot Meyerson · Risto Miikkulainen
Deep multitask learning boosts performance by sharing learned structure across related tasks. This paper adapts ideas from deep multitask learning to the setting where only a single task is available. The method is formalized as pseudo-task augmentation, in which models are trained with multiple decoders for each task. Pseudo-tasks simulate the effect of training towards closely-related tasks drawn from the same universe. In a suite of experiments, pseudo-task augmentation is shown to improve performance on single-task learning problems. When combined with multitask learning, further improvements are achieved, including state-of-the-art performance on the CelebA dataset, showing that pseudo-task augmentation and multitask learning have complementary value. All in all, pseudo-task augmentation is a broadly applicable and efficient way to boost performance in deep learning systems.