Timezone: »
Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.
Author Information
Fei Huang (Tsinghua University)
Hao Zhou (Bytedance)
Yang Liu (Tsinghua University)
Hang Li (Bytedance Technology)
Minlie Huang (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Directed Acyclic Transformer for Non-Autoregressive Machine Translation »
Tue. Jul 19th through Wed the 20th Room Hall E #126
More from the Same Authors
-
2023 Poster: Improving Adversarial Robustness of Deep Equilibrium Models with Explicit Regulations Along the Neural Dynamics »
Zonghan Yang · Peng Li · Tianyu Pang · Yang Liu -
2023 Poster: Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes »
Zhaowei Zhu · Yuanshun Yao · Jiankai Sun · Hang Li · Yang Liu -
2023 Poster: End-to-End Full-Atom Antibody Design »
Xiangzhe Kong · Wenbing Huang · Yang Liu -
2022 Poster: Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts »
Yan Zeng · Xinsong Zhang · Hang Li -
2022 Spotlight: Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts »
Yan Zeng · Xinsong Zhang · Hang Li -
2022 Poster: On the Learning of Non-Autoregressive Transformers »
Fei Huang · Tianhua Tao · Hao Zhou · Lei Li · Minlie Huang -
2022 Spotlight: On the Learning of Non-Autoregressive Transformers »
Fei Huang · Tianhua Tao · Hao Zhou · Lei Li · Minlie Huang -
2020 Poster: Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation »
Wenxian Shi · Hao Zhou · Ning Miao · Lei Li -
2020 Poster: Interpolation between Residual and Non-Residual Networks »
Zonghan Yang · Yang Liu · Chenglong Bao · Zuoqiang Shi