Multi-task learning allows the sharing of useful information between multiple related tasks. In natural language processing several recent approaches have successfully leveraged unsupervised pre-training on large amounts of data to perform well on various tasks, such as those in the GLUE benchmark. These results are based on fine-tuning on each task separately. We explore the multi-task learning setting for the recent BERT model on the GLUE benchmark, and how to best add task-specific parameters to a pre-trained BERT network, with a high degree of parameter sharing between tasks. We introduce new adaptation modules, PALs or ‘projected attention layers’, which use a low-dimensional multi-head attention mechanism, based on the idea that it is important to include layers with inductive biases useful for the input domain. By using PALs in parallel with BERT layers, we match the performance of fine-tuned BERT on the GLUE benchmark with ≈7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.
Asa Cooper Stickland (University of Edinburgh)
Iain Murray (University of Edinburgh)
Iain Murray is a SICSA Lecturer in Machine Learning at the University of Edinburgh. Iain was introduced to machine learning by David MacKay and Zoubin Ghahramani, both previous NIPS tutorial speakers. He obtained his PhD in 2007 from the Gatsby Computational Neuroscience Unit at UCL. His thesis on Monte Carlo methods received an honourable mention for the ISBA Savage Award. He was a commonwealth fellow in Machine Learning at the University of Toronto, before moving to Edinburgh in 2010. Iain's research interests include building flexible probabilistic models of data, and probabilistic inference from indirect and uncertain observations. Iain is passionate about teaching. He has lectured at several Summer schools, is listed in the top 15 authors on videolectures.net, and was awarded the EUSA Van Heyningen Award for Teaching in Science and Engineering in 2015.
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning »
Wed Jun 12th 06:30 -- 09:00 PM Room Pacific Ballroom