Timezone: »
Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization. We propose HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model. We demonstrate a simple setup for hypertuning with HyperT5, a T5-based hypermodel that produces soft prefixes or LoRA parameters for a frozen T5 model from few-shot examples. We train HyperT5 in two stages: first, hyperpretraining with a modified conditional language modeling objective that trains a hypermodel to generate parameters; second, multi-task fine-tuning (MTF) on a large number of diverse language tasks. We evaluate HyperT5 on P3, MetaICL and Super-NaturalInstructions datasets, and show that it can effectively generate parameters for unseen tasks. Moreover, we show that using hypermodel-generated parameters as initializations for further parameter-efficient fine-tuning improves performance. HyperTuning can thus be a flexible and efficient way to leverage large language models for diverse downstream applications.
Author Information
Jason Phang (NYU)
Yi Mao (Microsoft)
Pengcheng He (Microsoft)
Weizhu Chen (Microsoft)
More from the Same Authors
-
2023 Poster: Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise »
Zhenghao Lin · Yeyun Gong · Yelong Shen · Tong Wu · Zhihao Fan · Chen Lin · Nan Duan · Weizhu Chen -
2023 Poster: Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models »
Zhihong Shao · Yeyun Gong · Yelong Shen · Minlie Huang · Nan Duan · Weizhu Chen -
2023 Oral: Pretraining Language Models with Human Preferences »
Tomasz Korbak · Kejian Shi · Angelica Chen · Rasika Bhalerao · Christopher Buckley · Jason Phang · Samuel Bowman · Ethan Perez -
2023 Poster: LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation »
Yixiao Li · Yifan Yu · Qingru Zhang · Chen Liang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: Pretraining Language Models with Human Preferences »
Tomasz Korbak · Kejian Shi · Angelica Chen · Rasika Bhalerao · Christopher Buckley · Jason Phang · Samuel Bowman · Ethan Perez -
2023 Poster: Less is More: Task-aware Layer-wise Distillation for Language Model Compression »
Chen Liang · Simiao Zuo · Qingru Zhang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models »
Korawat Tanwisuth · Shujian Zhang · Huangjie Zheng · Pengcheng He · Mingyuan Zhou -
2022 Poster: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Qingru Zhang · Simiao Zuo · Chen Liang · Alexander Bukharin · Pengcheng He · Weizhu Chen · Tuo Zhao -
2022 Spotlight: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Qingru Zhang · Simiao Zuo · Chen Liang · Alexander Bukharin · Pengcheng He · Weizhu Chen · Tuo Zhao -
2021 Poster: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Spotlight: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Poster: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen -
2021 Spotlight: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen