Timezone: »
Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to complete the underlying task. Moreover, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift called Seeded iterated learning(SIL). We periodically refine a pretrained student agent by imitating data sampled from a newly generated teacher agent. At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.SIL does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We evaluate SIL in a toy-setting Lewis Game, and then scale it up to the translation game with natural language. In both settings, SIL helps counter language drift as well as it improves the task completion compared to baselines.
Author Information
Yuchen Lu (Mila & University of Montreal)
Soumye Singhal (Mila, University of Montreal)
Florian Strub (DeepMind)
Aaron Courville (Université de Montréal)
Olivier Pietquin (Google Brain)
More from the Same Authors
-
2022 Poster: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Spotlight: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Poster: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2022 Poster: Prompting Decision Transformer for Few-Shot Policy Generalization »
Mengdi Xu · Yikang Shen · Shun Zhang · Yuchen Lu · Ding Zhao · Josh Tenenbaum · Chuang Gan -
2022 Poster: The Primacy Bias in Deep Reinforcement Learning »
Evgenii Nikishin · Max Schwarzer · Pierluca D'Oro · Pierre-Luc Bacon · Aaron Courville -
2022 Spotlight: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2022 Spotlight: The Primacy Bias in Deep Reinforcement Learning »
Evgenii Nikishin · Max Schwarzer · Pierluca D'Oro · Pierre-Luc Bacon · Aaron Courville -
2022 Spotlight: Prompting Decision Transformer for Few-Shot Policy Generalization »
Mengdi Xu · Yikang Shen · Shun Zhang · Yuchen Lu · Ding Zhao · Josh Tenenbaum · Chuang Gan -
2021 Poster: Can Subnetwork Structure Be the Key to Out-of-Distribution Generalization? »
Dinghuai Zhang · Kartik Ahuja · Yilun Xu · Yisen Wang · Aaron Courville -
2021 Oral: Can Subnetwork Structure Be the Key to Out-of-Distribution Generalization? »
Dinghuai Zhang · Kartik Ahuja · Yilun Xu · Yisen Wang · Aaron Courville -
2021 Poster: Continuous Coordination As a Realistic Scenario for Lifelong Learning »
Hadi Nekoei · Akilesh Badrinaaraayanan · Aaron Courville · Sarath Chandar -
2021 Spotlight: Continuous Coordination As a Realistic Scenario for Lifelong Learning »
Hadi Nekoei · Akilesh Badrinaaraayanan · Aaron Courville · Sarath Chandar -
2021 Poster: Out-of-Distribution Generalization via Risk Extrapolation (REx) »
David Krueger · Ethan Caballero · Joern-Henrik Jacobsen · Amy Zhang · Jonathan Binas · Dinghuai Zhang · Remi Le Priol · Aaron Courville -
2021 Oral: Out-of-Distribution Generalization via Risk Extrapolation (REx) »
David Krueger · Ethan Caballero · Joern-Henrik Jacobsen · Amy Zhang · Jonathan Binas · Dinghuai Zhang · Remi Le Priol · Aaron Courville -
2020 Poster: AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation »
Jae Hyun Lim · Aaron Courville · Christopher Pal · Chin-Wei Huang -
2019 Workshop: Invertible Neural Networks and Normalizing Flows »
Chin-Wei Huang · David Krueger · Rianne Van den Berg · George Papamakarios · Aidan Gomez · Chris Cremer · Aaron Courville · Ricky T. Q. Chen · Danilo J. Rezende -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: Hierarchical Importance Weighted Autoencoders »
Chin-Wei Huang · Kris Sankaran · Eeshan Dhekane · Alexandre Lacoste · Aaron Courville -
2019 Oral: Hierarchical Importance Weighted Autoencoders »
Chin-Wei Huang · Kris Sankaran · Eeshan Dhekane · Alexandre Lacoste · Aaron Courville