Skip to yearly menu bar Skip to main content


Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Implicit regularization of multi-task learning and finetuning in overparameterized neural networks

Samuel Lippl · Jack Lindsey


Abstract: It is common in deep learning to train networks on auxiliary tasks with the expectation that the learning will transfer, at least partially, to another task of interest. In this work, we investigate the inductive biases that result from learning auxiliary tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In the simplified setting of two-layer diagonal linear networks trained with gradient descent, we show that multi-task learning is biased to minimize an $\ell_{1,2}$ mixed norm, which incentivizes sharing features across tasks, and that PT+FT minimizes a similar penalty. In simulations of diagonal linear networks, we show that MTL and PT+FT improve sample efficiency to comparable extents compared to single-task learning. In ReLU networks, we find that while MTL still minimizes an analogue of the $\ell_{1,2}$ penalty, PT+FT exhibits a qualitatively different bias that permits learning of features correlated with (but not identical to) those used for the auxiliary task. As a result, we find that in realistic settings, MTL generalizes better when comparatively little data is available for the task of interest, while PT+FT outperforms it with more data available. Strikingly, the same effect holds for a deep convolutional architecture trained on an image classification task. Our results shed light on the structure leveraged by transfer learning strategies and suggest pathways to improving their performance.

Chat is not available.