Timezone: »
Oral
Parameter-Efficient Transfer Learning for NLP
Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly
Fine-tuning large pretrained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to $26$ diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within $0.8\%$ of the performance of full fine-tuning, adding only $3.6\%$ parameters per task. By contrast, fine-tuning trains $100\%$ of the parameters per task.
Author Information
Neil Houlsby (Google)
Andrei Giurgiu (Google)
Stanislaw Jastrzebski (New York University)
Bruna Morrone (Google)
Quentin de Laroussilhe (Google Brain)
Andrea Gesmundo (Google)
Mona Attariyan (Google)
Sylvain Gelly (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Parameter-Efficient Transfer Learning for NLP »
Fri. Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom #102
More from the Same Authors
-
2022 Poster: Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks »
Nan Wu · Stanislaw Jastrzebski · Kyunghyun Cho · Krzysztof J Geras -
2022 Spotlight: Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks »
Nan Wu · Stanislaw Jastrzebski · Kyunghyun Cho · Krzysztof J Geras -
2021 Poster: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras -
2021 Spotlight: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 Poster: Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities »
Octavian-Eugen Ganea · Sylvain Gelly · Gary Becigneul · Aliaksei Severyn -
2019 Oral: Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities »
Octavian-Eugen Ganea · Sylvain Gelly · Gary Becigneul · Aliaksei Severyn -
2019 Poster: A Large-Scale Study on Regularization and Normalization in GANs »
Karol Kurach · Mario Lucic · Xiaohua Zhai · Marcin Michalski · Sylvain Gelly -
2019 Oral: A Large-Scale Study on Regularization and Normalization in GANs »
Karol Kurach · Mario Lucic · Xiaohua Zhai · Marcin Michalski · Sylvain Gelly -
2019 Poster: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2019 Poster: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2017 Poster: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien -
2017 Talk: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien