Timezone: »
In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation through designing a novel model structure for large-scale pre-training. A pretrained BANG model can simultaneously support AR, NAR, and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum), and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39, and 5.90 in the overall scores of SQuAD, XSUM, and PersonaChat compared with the NAR strong baselines, respectively. Our code will be made publicly available.
Author Information
Weizhen Qi (University of Science and Technology of China)
Yeyun Gong (Microsoft Research Asia)
Jian Jiao (Microsoft)
Yu Yan (Microsoft)
Weizhu Chen (Microsoft)
Dayiheng Liu (Alibaba DAMO Academy)
Kewen Tang (Microsoft)
Houqiang Li (University of Science and Technology of China)
Jiusheng Chen (Microsoft)
Ruofei Zhang (Microsoft)
Ming Zhou (Microsoft Research)
Nan Duan (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Fri. Jul 23rd 03:30 -- 03:35 AM Room
More from the Same Authors
-
2023 Poster: Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise »
Zhenghao Lin · Yeyun Gong · Yelong Shen · Tong Wu · Zhihao Fan · Chen Lin · Nan Duan · Weizhu Chen -
2023 Poster: LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation »
Yixiao Li · Yifan Yu · Qingru Zhang · Chen Liang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models »
Zhihong Shao · Yeyun Gong · Yelong Shen · Minlie Huang · Nan Duan · Weizhu Chen -
2023 Poster: Less is More: Task-aware Layer-wise Distillation for Language Model Compression »
Chen Liang · Simiao Zuo · Qingru Zhang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: LongCoder: A Long-Range Pre-trained Language Model for Code Completion »
Daya Guo · Canwen Xu · Nan Duan · Jian Yin · Julian McAuley -
2023 Poster: HyperTuning: Toward Adapting Large Language Models without Back-propagation »
Jason Phang · Yi Mao · Pengcheng He · Weizhu Chen -
2022 Poster: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Qingru Zhang · Simiao Zuo · Chen Liang · Alexander Bukharin · Pengcheng He · Weizhu Chen · Tuo Zhao -
2022 Spotlight: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Qingru Zhang · Simiao Zuo · Chen Liang · Alexander Bukharin · Pengcheng He · Weizhu Chen · Tuo Zhao -
2022 Poster: Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent »
Weiming Liu · Huacong Jiang · Bin Li · Houqiang Li -
2022 Poster: Supervised Off-Policy Ranking »
Yue Jin · Yue Zhang · Tao Qin · Xudong Zhang · Jian Yuan · Houqiang Li · Tie-Yan Liu -
2022 Spotlight: Supervised Off-Policy Ranking »
Yue Jin · Yue Zhang · Tao Qin · Xudong Zhang · Jian Yuan · Houqiang Li · Tie-Yan Liu -
2022 Spotlight: Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent »
Weiming Liu · Huacong Jiang · Bin Li · Houqiang Li -
2021 Poster: EL-Attention: Memory Efficient Lossless Attention for Generation »
Yu Yan · Jiusheng Chen · Weizhen Qi · Nikhil Bhendawade · Yeyun Gong · Nan Duan · Ruofei Zhang -
2021 Spotlight: EL-Attention: Memory Efficient Lossless Attention for Generation »
Yu Yan · Jiusheng Chen · Weizhen Qi · Nikhil Bhendawade · Yeyun Gong · Nan Duan · Ruofei Zhang -
2021 Poster: SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels »
Kunal Dahiya · Ananye Agarwal · Deepak Saini · Gururaj K · Jian Jiao · Amit Singh · Sumeet Agarwal · Purushottam Kar · Manik Varma -
2021 Spotlight: SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels »
Kunal Dahiya · Ananye Agarwal · Deepak Saini · Gururaj K · Jian Jiao · Amit Singh · Sumeet Agarwal · Purushottam Kar · Manik Varma -
2021 Poster: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen -
2021 Spotlight: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen