Timezone: »

Poster
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
Asa Cooper Stickland · Iain Murray

Wed Jun 12 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #258
Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a \hbox{single} BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers', we match the performance of separately fine-tuned models on the GLUE benchmark with $\approx$7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.

#### Author Information

##### Iain Murray (University of Edinburgh)

Iain Murray is a SICSA Lecturer in Machine Learning at the University of Edinburgh. Iain was introduced to machine learning by David MacKay and Zoubin Ghahramani, both previous NIPS tutorial speakers. He obtained his PhD in 2007 from the Gatsby Computational Neuroscience Unit at UCL. His thesis on Monte Carlo methods received an honourable mention for the ISBA Savage Award. He was a commonwealth fellow in Machine Learning at the University of Toronto, before moving to Edinburgh in 2010. Iain's research interests include building flexible probabilistic models of data, and probabilistic inference from indirect and uncertain observations. Iain is passionate about teaching. He has lectured at several Summer schools, is listed in the top 15 authors on videolectures.net, and was awarded the EUSA Van Heyningen Award for Teaching in Science and Engineering in 2015.