Timezone: »
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our method on neural machine translation and visual question answering, showing great potential of incorporating our method into various attention-related tasks.
Author Information
Shujian Zhang (UT Austin)
Xinjie Fan (UT Austin)
Bo Chen (School of Electronic Engineering, Xidian University)
Bo Chen, Ph.D., Professor. Before joining the Department of Electronic Engineering in Xidian University in 2013, I was a post-doc researcher, research scientist and senior research scientist at the Department of Electrical and Computer Engineering in Duke University. In 2013 and 2014, I was elected into the Program for New Century Excellent Talents in University and the Program for Thousand Youth Talents respectively. I am interested in developing statistical machine learning methods for the complex and large-scale data. My current interests are in statistical signal processing, statistical machine learning, deep learning and their applications to radar target detection and recognition.
Mingyuan Zhou (University of Texas at Austin)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Bayesian Attention Belief Networks »
Thu. Jul 22nd 04:00 -- 06:00 PM Room
More from the Same Authors
-
2023 Poster: Prototype-oriented unsupervised anomaly detection for multivariate time series »
yuxin li · Wenchao Chen · Bo Chen · Dongsheng Wang · Long Tian · Mingyuan Zhou -
2023 Poster: Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process »
Zhibin Duan · Xinyang Liu · Yudi Su · Yishi Xu · Bo Chen · Mingyuan Zhou -
2023 Poster: Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling »
Tianqi Chen · Mingyuan Zhou -
2023 Poster: POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models »
Korawat Tanwisuth · Shujian Zhang · Huangjie Zheng · Pengcheng He · Mingyuan Zhou -
2022 Poster: Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly Detection »
Wenchao Chen · Long Tian · Bo Chen · Liang Dai · Zhibin Duan · Mingyuan Zhou -
2022 Poster: Bayesian Deep Embedding Topic Meta-Learner »
Zhibin Duan · Yishi Xu · Jianqiao Sun · Bo Chen · Wenchao Chen · CHAOJIE WANG · Mingyuan Zhou -
2022 Spotlight: Bayesian Deep Embedding Topic Meta-Learner »
Zhibin Duan · Yishi Xu · Jianqiao Sun · Bo Chen · Wenchao Chen · CHAOJIE WANG · Mingyuan Zhou -
2022 Spotlight: Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly Detection »
Wenchao Chen · Long Tian · Bo Chen · Liang Dai · Zhibin Duan · Mingyuan Zhou -
2022 Poster: Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning »
Shentao Yang · Yihao Feng · Shujian Zhang · Mingyuan Zhou -
2022 Spotlight: Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning »
Shentao Yang · Yihao Feng · Shujian Zhang · Mingyuan Zhou -
2021 Poster: Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network »
Zhibin Duan · Dongsheng Wang · Bo Chen · CHAOJIE WANG · Wenchao Chen · yewen li · Jie Ren · Mingyuan Zhou -
2021 Poster: ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables »
Alek Dimitriev · Mingyuan Zhou -
2021 Spotlight: ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables »
Alek Dimitriev · Mingyuan Zhou -
2021 Spotlight: Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network »
Zhibin Duan · Dongsheng Wang · Bo Chen · CHAOJIE WANG · Wenchao Chen · yewen li · Jie Ren · Mingyuan Zhou -
2020 Poster: On hyperparameter tuning in general clustering problemsm »
Xinjie Fan · Yuguang Yue · Purnamrita Sarkar · Y. X. Rachel Wang -
2020 Poster: Thompson Sampling via Local Uncertainty »
Zhendong Wang · Mingyuan Zhou -
2020 Poster: Bayesian Graph Neural Networks with Adaptive Connection Sampling »
Arman Hasanzadeh · Ehsan Hajiramezanali · Shahin Boluki · Mingyuan Zhou · Nick Duffield · Krishna Narayanan · Xiaoning Qian -
2020 Poster: Recurrent Hierarchical Topic-Guided RNN for Language Generation »
Dandan Guo · Bo Chen · Ruiying Lu · Mingyuan Zhou -
2019 Poster: ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables »
Mingzhang Yin · Yuguang Yue · Mingyuan Zhou -
2019 Oral: ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables »
Mingzhang Yin · Yuguang Yue · Mingyuan Zhou -
2019 Poster: Convolutional Poisson Gamma Belief Network »
CHAOJIE WANG · Bo Chen · SUCHENG XIAO · Mingyuan Zhou -
2019 Poster: Locally Private Bayesian Inference for Count Models »
Aaron Schein · Steven Wu · Alexandra Schofield · Mingyuan Zhou · Hanna Wallach -
2019 Oral: Convolutional Poisson Gamma Belief Network »
CHAOJIE WANG · Bo Chen · SUCHENG XIAO · Mingyuan Zhou -
2019 Oral: Locally Private Bayesian Inference for Count Models »
Aaron Schein · Steven Wu · Alexandra Schofield · Mingyuan Zhou · Hanna Wallach -
2018 Poster: Inter and Intra Topic Structure Learning with Word Embeddings »
He Zhao · Lan Du · Wray Buntine · Mingyuan Zhou -
2018 Oral: Inter and Intra Topic Structure Learning with Word Embeddings »
He Zhao · Lan Du · Wray Buntine · Mingyuan Zhou -
2018 Poster: Semi-Implicit Variational Inference »
Mingzhang Yin · Mingyuan Zhou -
2018 Oral: Semi-Implicit Variational Inference »
Mingzhang Yin · Mingyuan Zhou -
2017 Poster: Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC »
Yulai Cong · Bo Chen · Hongwei Liu · Mingyuan Zhou -
2017 Talk: Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC »
Yulai Cong · Bo Chen · Hongwei Liu · Mingyuan Zhou