Track: Transfer and Multi-Task Learning 2

Thu 12 July 8:00 - 8:20 PDT

Composable Planning with Attributes

Amy Zhang · Sainbayar Sukhbaatar · Adam Lerer · Arthur Szlam · Facebook Rob Fergus

The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between ``nearby'' sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.

Thu 12 July 8:20 - 8:40 PDT

Measuring abstract reasoning in neural networks

Adam Santoro · Feilx Hill · David GT Barrett · Ari S Morcos · Timothy Lillicrap

Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation 'regimes' in which the training data and test questions differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model's ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction.

Thu 12 July 8:40 - 8:50 PDT

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

Zhao Chen · Vijay Badrinarayanan · Chen-Yu Lee · Andrew Rabinovich

Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.

Thu 12 July 8:50 - 9:00 PDT

Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Xuhong LI · Yves Grandvalet · Franck Davoine

In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch.When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task.However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task.In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model.We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.

Main Navigation

Session

Transfer and Multi-Task Learning 2

Composable Planning with Attributes

Measuring abstract reasoning in neural networks

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

Explicit Inductive Bias for Transfer Learning with Convolutional Networks