Timezone: »
Task-specific fine-tuning on pre-trained transformers has achieved performance breakthroughs in multiple NLP tasks. Yet, as both computation and parameter size grows linearly with the number of sub-tasks, it is increasingly difficult to adopt such methods to the real world due to unrealistic memory and computation overhead on computing devices. Previous works on fine-tuning focus on reducing the growing parameter size to save storage cost by parameter sharing. However, compared to storage, the constraint of computation is a more critical issue with the fine-tuning models in modern computing environments. In this work, we propose LeTS, a framework that leverages both computation and parameter sharing across multiple tasks. Compared to traditional fine-tuning, LeTS proposes a novel neural architecture that contains a fixed pre-trained transformer model, plus learnable additive components for sub-tasks. The learnable components reuse the intermediate activations in the fixed pre-trained model, decoupling computation dependency. Differentiable neural architecture search is used to determine a task-specific computation sharing scheme, and a novel early stage pruning is applied to additive components for sparsity to achieve parameter sharing. Extensive experiments show that with 1.4% of extra parameters per task, LeTS reduces the computation by 49.5% on GLUE benchmarks with only 0.2% accuracy loss compared to full fine-tuning.
Author Information
Cheng Fu (University of California, San Diego)
Hanxian Huang (UC San Diego)
Xinyun Chen (UC Berkeley)
Yuandong Tian (Facebook AI Research)
Jishen Zhao (University of California, San Diego)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing »
Fri. Jul 23rd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2021 : Learning Space Partitions for Path Planning »
Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph E Gonzalez · Dan Klein · Yuandong Tian -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2021 Workshop: Workshop on Socially Responsible Machine Learning »
Chaowei Xiao · Animashree Anandkumar · Mingyan Liu · Dawn Song · Raquel Urtasun · Jieyu Zhao · Xueru Zhang · Cihang Xie · Xinyun Chen · Bo Li -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Poster: SpreadsheetCoder: Formula Prediction from Semi-structured Context »
Xinyun Chen · Petros Maniatis · Rishabh Singh · Charles Sutton · Hanjun Dai · Max Lin · Denny Zhou -
2021 Poster: LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs »
Hongyu Ren · Hanjun Dai · Bo Dai · Xinyun Chen · Michihiro Yasunaga · Haitian Sun · Dale Schuurmans · Jure Leskovec · Denny Zhou -
2021 Poster: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Spotlight: SpreadsheetCoder: Formula Prediction from Semi-structured Context »
Xinyun Chen · Petros Maniatis · Rishabh Singh · Charles Sutton · Hanjun Dai · Max Lin · Denny Zhou -
2021 Spotlight: LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs »
Hongyu Ren · Hanjun Dai · Bo Dai · Xinyun Chen · Michihiro Yasunaga · Haitian Sun · Dale Schuurmans · Jure Leskovec · Denny Zhou -
2021 Oral: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Poster: Few-Shot Neural Architecture Search »
Yiyang Zhao · Linnan Wang · Yuandong Tian · Rodrigo Fonseca · Tian Guo -
2021 Oral: Few-Shot Neural Architecture Search »
Yiyang Zhao · Linnan Wang · Yuandong Tian · Rodrigo Fonseca · Tian Guo -
2020 Poster: Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension »
Yuandong Tian -
2019 Poster: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2019 Oral: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2018 Poster: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2018 Oral: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2017 Poster: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Yuandong Tian -
2017 Talk: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Yuandong Tian