Timezone: »
This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.
Author Information
Natasha Jaques (Massachusetts Institute of Technology)
Shixiang Gu (Cambridge)
Dzmitry Bahdanau (Université de Montréal)
Jose Miguel Hernandez-Lobato (University of Cambridge)
Richard E Turner (University of Cambridge)
Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in Computer Vision and Machine Learning in the Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, UK. He is a Fellow of Christ's College Cambridge. Previously, he held an EPSRC Postdoctoral research fellowship which he spent at both the University of Cambridge and the Laboratory for Computational Vision, NYU, USA. He has a PhD degree in Computational Neuroscience and Machine Learning from the Gatsby Computational Neuroscience Unit, UCL, UK and a M.Sci. degree in Natural Sciences (specialism Physics) from the University of Cambridge, UK. His research interests include machine learning, signal processing and developing probabilistic models of perception.
Douglas Eck (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Tue. Aug 8th 03:48 -- 04:06 AM Room Parkside 1
More from the Same Authors
-
2021 : Attacking Few-Shot Classifiers with Adversarial Support Poisoning »
Elre Oldewage · John Bronskill · Richard E Turner -
2023 : Leveraging Task Structures for Improved Identifiability in Neural Network Representations »
Wenlin Chen · Julien Horwood · Juyeon Heo · Jose Miguel Hernandez-Lobato -
2023 : Beyond Intuition, a Framework for Applying GPs to Real-World Data »
Kenza Tazi · Jihao Andreas Lin · ST John · Hong Ge · Richard E Turner · Ross Viljoen · Alex Gardner -
2023 : Minimal Random Code Learning with Mean-KL Parameterization »
Jihao Andreas Lin · Gergely Flamich · Jose Miguel Hernandez-Lobato -
2023 : Modeling Accurate Long Rollouts with Temporal Neural PDE Solvers »
Phillip Lippe · Bastiaan Veeling · Paris Perdikaris · Richard E Turner · Johannes Brandstetter -
2022 Poster: Adapting the Linearised Laplace Model Evidence for Modern Deep Learning »
Javier Antorán · David Janz · James Allingham · Erik Daxberger · Riccardo Barbano · Eric Nalisnick · Jose Miguel Hernandez-Lobato -
2022 Spotlight: Adapting the Linearised Laplace Model Evidence for Modern Deep Learning »
Javier Antorán · David Janz · James Allingham · Erik Daxberger · Riccardo Barbano · Eric Nalisnick · Jose Miguel Hernandez-Lobato -
2022 Poster: Action-Sufficient State Representation Learning for Control with Structural Constraints »
Biwei Huang · Chaochao Lu · Liu Leqi · Jose Miguel Hernandez-Lobato · Clark Glymour · Bernhard Schölkopf · Kun Zhang -
2022 Spotlight: Action-Sufficient State Representation Learning for Control with Structural Constraints »
Biwei Huang · Chaochao Lu · Liu Leqi · Jose Miguel Hernandez-Lobato · Clark Glymour · Bernhard Schölkopf · Kun Zhang -
2022 Poster: Fast Relative Entropy Coding with A* coding »
Gergely Flamich · Stratis Markou · Jose Miguel Hernandez-Lobato -
2022 Spotlight: Fast Relative Entropy Coding with A* coding »
Gergely Flamich · Stratis Markou · Jose Miguel Hernandez-Lobato -
2021 Poster: Active Slices for Sliced Stein Discrepancy »
Wenbo Gong · Kaibo Zhang · Yingzhen Li · Jose Miguel Hernandez-Lobato -
2021 Spotlight: Active Slices for Sliced Stein Discrepancy »
Wenbo Gong · Kaibo Zhang · Yingzhen Li · Jose Miguel Hernandez-Lobato -
2021 Poster: A Gradient Based Strategy for Hamiltonian Monte Carlo Hyperparameter Optimization »
Andrew Campbell · Wenlong Chen · Vincent Stimper · Jose Miguel Hernandez-Lobato · Yichuan Zhang -
2021 Spotlight: A Gradient Based Strategy for Hamiltonian Monte Carlo Hyperparameter Optimization »
Andrew Campbell · Wenlong Chen · Vincent Stimper · Jose Miguel Hernandez-Lobato · Yichuan Zhang -
2021 Poster: Emergent Social Learning via Multi-agent Reinforcement Learning »
Kamal Ndousse · Douglas Eck · Sergey Levine · Natasha Jaques -
2021 Spotlight: Emergent Social Learning via Multi-agent Reinforcement Learning »
Kamal Ndousse · Douglas Eck · Sergey Levine · Natasha Jaques -
2021 Poster: Bayesian Deep Learning via Subnetwork Inference »
Erik Daxberger · Eric Nalisnick · James Allingham · Javier Antorán · Jose Miguel Hernandez-Lobato -
2021 Spotlight: Bayesian Deep Learning via Subnetwork Inference »
Erik Daxberger · Eric Nalisnick · James Allingham · Javier Antorán · Jose Miguel Hernandez-Lobato -
2020 : "Latent Space Optimization with Deep Generative Models" »
Jose Miguel Hernandez-Lobato -
2020 : Invited Talk: Efficient Missing-value Acquisition with Variational Autoencoders »
Jose Miguel Hernandez-Lobato -
2020 Poster: Reinforcement Learning for Molecular Design Guided by Quantum Mechanics »
Gregor Simm · Robert Pinsler · Jose Miguel Hernandez-Lobato -
2020 Poster: A Generative Model for Molecular Distance Geometry »
Gregor Simm · Jose Miguel Hernandez-Lobato -
2020 Poster: Scalable Exact Inference in Multi-Output Gaussian Processes »
Wessel Bruinsma · Eric Perim Martins · William Tebbutt · Scott Hosking · Arno Solin · Richard E Turner -
2020 Poster: TaskNorm: Rethinking Batch Normalization for Meta-Learning »
John Bronskill · Jonathan Gordon · James Requeima · Sebastian Nowozin · Richard E Turner -
2019 Poster: Dropout as a Structured Shrinkage Prior »
Eric Nalisnick · Jose Miguel Hernandez-Lobato · Padhraic Smyth -
2019 Oral: Dropout as a Structured Shrinkage Prior »
Eric Nalisnick · Jose Miguel Hernandez-Lobato · Padhraic Smyth -
2019 Poster: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning »
Natasha Jaques · Angeliki Lazaridou · Edward Hughes · Caglar Gulcehre · Pedro Ortega · DJ Strouse · Joel Z Leibo · Nando de Freitas -
2019 Poster: EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE »
Chao Ma · Sebastian Tschiatschek · Konstantina Palla · Jose Miguel Hernandez-Lobato · Sebastian Nowozin · Cheng Zhang -
2019 Poster: Variational Implicit Processes »
Chao Ma · Yingzhen Li · Jose Miguel Hernandez-Lobato -
2019 Oral: Variational Implicit Processes »
Chao Ma · Yingzhen Li · Jose Miguel Hernandez-Lobato -
2019 Oral: EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE »
Chao Ma · Sebastian Tschiatschek · Konstantina Palla · Jose Miguel Hernandez-Lobato · Sebastian Nowozin · Cheng Zhang -
2019 Oral: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning »
Natasha Jaques · Angeliki Lazaridou · Edward Hughes · Caglar Gulcehre · Pedro Ortega · DJ Strouse · Joel Z Leibo · Nando de Freitas -
2019 Poster: Learning to Groove with Inverse Sequence Transformations »
Jon Gillick · Adam Roberts · Jesse Engel · Douglas Eck · David Bamman -
2019 Oral: Learning to Groove with Inverse Sequence Transformations »
Jon Gillick · Adam Roberts · Jesse Engel · Douglas Eck · David Bamman -
2018 Poster: Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning »
Stefan Depeweg · Jose Miguel Hernandez-Lobato · Finale Doshi-Velez · Steffen Udluft -
2018 Poster: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Poster: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music »
Adam Roberts · Jesse Engel · Colin Raffel · Curtis Hawthorne · Douglas Eck -
2018 Oral: Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning »
Stefan Depeweg · Jose Miguel Hernandez-Lobato · Finale Doshi-Velez · Steffen Udluft -
2018 Oral: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music »
Adam Roberts · Jesse Engel · Colin Raffel · Curtis Hawthorne · Douglas Eck -
2018 Oral: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Poster: Structured Evolution with Compact Architectures for Scalable Policy Optimization »
Krzysztof Choromanski · Mark Rowland · Vikas Sindhwani · Richard E Turner · Adrian Weller -
2018 Oral: Structured Evolution with Compact Architectures for Scalable Policy Optimization »
Krzysztof Choromanski · Mark Rowland · Vikas Sindhwani · Richard E Turner · Adrian Weller -
2017 Poster: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Poster: Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space »
Jose Miguel Hernandez-Lobato · James Requeima · Edward Pyzer-Knapp · Alan Aspuru-Guzik -
2017 Poster: Grammar Variational Autoencoder »
Matt J. Kusner · Brooks Paige · Jose Miguel Hernandez-Lobato -
2017 Talk: Grammar Variational Autoencoder »
Matt J. Kusner · Brooks Paige · Jose Miguel Hernandez-Lobato -
2017 Talk: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Talk: Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space »
Jose Miguel Hernandez-Lobato · James Requeima · Edward Pyzer-Knapp · Alan Aspuru-Guzik -
2017 Poster: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck -
2017 Poster: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Cinjon Resnick · Adam Roberts · Jesse Engel · Douglas Eck · Sander Dieleman · Karen Simonyan · Mohammad Norouzi -
2017 Talk: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Cinjon Resnick · Adam Roberts · Jesse Engel · Douglas Eck · Sander Dieleman · Karen Simonyan · Mohammad Norouzi -
2017 Talk: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck