Timezone: »
The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we demonstrate, existing recurrent VAE models have difficulty modeling sequences with long-term structure. To address this issue, we propose the use of a hierarchical decoder, which first outputs embeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently. This structure encourages the model to utilize its latent code, thereby avoiding the "posterior collapse" problem which remains an issue for recurrent VAEs. We apply this architecture to modeling sequences of musical notes and find that it exhibits dramatically better sampling, interpolation, and reconstruction performance than a "flat" baseline model. An implementation of our "MusicVAE" is available online at https://goo.gl/magenta/musicvae-code.
Author Information
Adam Roberts (Google Brain)
Jesse Engel (Google Brain)
Colin Raffel (Google)
Curtis Hawthorne (Google Brain)
Douglas Eck (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music »
Thu. Jul 12th 12:50 -- 01:00 PM Room Victoria
More from the Same Authors
-
2023 Poster: The Flan Collection: Designing Data and Methods for Effective Instruction Tuning »
Shayne Longpre · Le Hou · Tu Vu · Albert Webson · Hyung Won Chung · Yi Tay · Denny Zhou · Quoc Le · Barret Zoph · Jason Wei · Adam Roberts -
2023 Poster: Large Language Models Struggle to Learn Long-Tail Knowledge »
Nikhil Kandpal · Haikang Deng · Adam Roberts · Eric Wallace · Colin Raffel -
2022 Poster: General-purpose, long-context autoregressive modeling with Perceiver AR »
Curtis Hawthorne · Drew Jaegle · Cătălina Cangea · Sebastian Borgeaud · Charlie Nash · Mateusz Malinowski · Sander Dieleman · Oriol Vinyals · Matthew Botvinick · Ian Simon · Hannah Sheahan · Neil Zeghidour · Jean-Baptiste Alayrac · Joao Carreira · Jesse Engel -
2022 Spotlight: General-purpose, long-context autoregressive modeling with Perceiver AR »
Curtis Hawthorne · Drew Jaegle · Cătălina Cangea · Sebastian Borgeaud · Charlie Nash · Mateusz Malinowski · Sander Dieleman · Oriol Vinyals · Matthew Botvinick · Ian Simon · Hannah Sheahan · Neil Zeghidour · Jean-Baptiste Alayrac · Joao Carreira · Jesse Engel -
2021 Poster: Emergent Social Learning via Multi-agent Reinforcement Learning »
Kamal Ndousse · Douglas Eck · Sergey Levine · Natasha Jaques -
2021 Spotlight: Emergent Social Learning via Multi-agent Reinforcement Learning »
Kamal Ndousse · Douglas Eck · Sergey Levine · Natasha Jaques -
2020 : Self-supervised Pitch Detection by Inverse Audio Synthesis »
Jesse Engel -
2020 Poster: Encoding Musical Style with Transformer Autoencoders »
Kristy Choi · Curtis Hawthorne · Ian Simon · Monica Dinculescu · Jesse Engel -
2019 Poster: Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition »
Yao Qin · Nicholas Carlini · Garrison Cottrell · Ian Goodfellow · Colin Raffel -
2019 Oral: Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition »
Yao Qin · Nicholas Carlini · Garrison Cottrell · Ian Goodfellow · Colin Raffel -
2019 Poster: Learning to Groove with Inverse Sequence Transformations »
Jon Gillick · Adam Roberts · Jesse Engel · Douglas Eck · David Bamman -
2019 Oral: Learning to Groove with Inverse Sequence Transformations »
Jon Gillick · Adam Roberts · Jesse Engel · Douglas Eck · David Bamman -
2018 Poster: Is Generator Conditioning Causally Related to GAN Performance? »
Augustus Odena · Jacob Buckman · Catherine Olsson · Tom B Brown · Christopher Olah · Colin Raffel · Ian Goodfellow -
2018 Oral: Is Generator Conditioning Causally Related to GAN Performance? »
Augustus Odena · Jacob Buckman · Catherine Olsson · Tom B Brown · Christopher Olah · Colin Raffel · Ian Goodfellow -
2017 Poster: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Poster: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck -
2017 Poster: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Cinjon Resnick · Adam Roberts · Jesse Engel · Douglas Eck · Sander Dieleman · Karen Simonyan · Mohammad Norouzi -
2017 Talk: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Cinjon Resnick · Adam Roberts · Jesse Engel · Douglas Eck · Sander Dieleman · Karen Simonyan · Mohammad Norouzi -
2017 Talk: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Talk: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck