Timezone: »

Focused Hierarchical RNNs for Conditional Sequence Processing
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal

Thu Jul 12 04:30 AM -- 04:50 AM (PDT) @ Victoria

Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most of these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently.We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key parts of the input as needed. We formulate this using a multi-layer conditional %hierarchical sequence encoder that reads in one token at a time and makes a discrete decision on whether the token is relevant to the context or question being asked. The discrete gating mechanism takes in the context embedding and the current hidden state as inputs and controls information flow into the layer above. We train it using policy gradient methods. We evaluate this method on several types of tasks with different attributes. First, we evaluate the method on synthetic tasks which allow us to evaluate the model for its generalization ability and probe the behavior of the gates in more controlled settings. We then evaluate this approach on large scale Question Answering tasks including the challenging MS MARCO and SearchQA tasks. Our models shows consistent improvements for both tasks over prior work and our baselines. It has also shown to generalize significantly better on synthetic tasks as compared to the baselines.

Author Information

Rosemary Nan Ke (MILA, University of Montreal)

I am a PhD student at Mila, I am advised by Chris Pal and Yoshua Bengio. My research interest are efficient credit assignment, causal learning and model-based reinforcement learning. Here is my homepage https://nke001.github.io/

Konrad Zolna (Universite de Montreal, Jagiellonian University)
Alessandro Sordoni (Microsoft Research)
Zhouhan Lin (MILA, University of Montreal)
Adam Trischler (Microsoft Research)
Yoshua Bengio (Mila / U. Montreal)

Yoshua Bengio is recognized as one of the world’s leading experts in artificial intelligence and a pioneer in deep learning. Since 1993, he has been a professor in the Department of Computer Science and Operational Research at the Université de Montréal. He is the founder and scientific director of Mila, the Quebec Institute of Artificial Intelligence, the world’s largest university-based research group in deep learning. He is a member of the NeurIPS board and co-founder and general chair for the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains and is Fellow of the same institution. In 2018, Yoshua Bengio ranked as the computer scientist with the most new citations, worldwide, thanks to his many publications. In 2019, he received the ACM A.M. Turing Award, “the Nobel Prize of Computing”, jointly with Geoffrey Hinton and Yann LeCun for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing. In 2020 he was nominated Fellow of the Royal Society of London.

Joelle Pineau (McGill University / Facebook)
Laurent Charlin (McGill University)
Christopher Pal (École Polytechnique de Montréal)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors