Timezone: »
In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.
Author Information
Hang ZHANG (College of Computer Science, Sichuan University)
Yeyun Gong (Microsoft Research Asia)
Yelong Shen (microsoft)
Weisheng Li (University of Science and Technology of China)
Jiancheng Lv (Sichuan University)
Nan Duan (Microsoft Research)
Weizhu Chen (Microsoft)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Poolingformer: Long Document Modeling with Pooling Attention »
Wed. Jul 21st 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
-
2023 Poster: Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise »
Zhenghao Lin · Yeyun Gong · Yelong Shen · Tong Wu · Zhihao Fan · Chen Lin · Nan Duan · Weizhu Chen -
2023 Poster: LongCoder: A Long-Range Pre-trained Language Model for Code Completion »
Daya Guo · Canwen Xu · Nan Duan · Jian Yin · Julian McAuley -
2023 Poster: Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models »
Zhihong Shao · Yeyun Gong · Yelong Shen · Minlie Huang · Nan Duan · Weizhu Chen -
2023 Poster: LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation »
Yixiao Li · Yifan Yu · Qingru Zhang · Chen Liang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: Less is More: Task-aware Layer-wise Distillation for Language Model Compression »
Chen Liang · Simiao Zuo · Qingru Zhang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: HyperTuning: Toward Adapting Large Language Models without Back-propagation »
Jason Phang · Yi Mao · Pengcheng He · Weizhu Chen -
2023 Poster: XAI Beyond Classification: Interpretable Neural Clustering »
Xi Peng · Yunfan Li · Ivor W. Tsang · Hongyuan Zhu · Jiancheng Lv · Joey Tianyi Zhou -
2022 Poster: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Qingru Zhang · Simiao Zuo · Chen Liang · Alexander Bukharin · Pengcheng He · Weizhu Chen · Tuo Zhao -
2022 Spotlight: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Qingru Zhang · Simiao Zuo · Chen Liang · Alexander Bukharin · Pengcheng He · Weizhu Chen · Tuo Zhao -
2021 Poster: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Spotlight: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Poster: EL-Attention: Memory Efficient Lossless Attention for Generation »
Yu Yan · Jiusheng Chen · Weizhen Qi · Nikhil Bhendawade · Yeyun Gong · Nan Duan · Ruofei Zhang -
2021 Spotlight: EL-Attention: Memory Efficient Lossless Attention for Generation »
Yu Yan · Jiusheng Chen · Weizhen Qi · Nikhil Bhendawade · Yeyun Gong · Nan Duan · Ruofei Zhang -
2019 Poster: COMIC: Multi-view Clustering Without Parameter Selection »
Xi Peng · Zhenyu Huang · Jiancheng Lv · Hongyuan Zhu · Joey Tianyi Zhou -
2019 Oral: COMIC: Multi-view Clustering Without Parameter Selection »
Xi Peng · Zhenyu Huang · Jiancheng Lv · Hongyuan Zhu · Joey Tianyi Zhou