Timezone: »
Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.
Author Information
Cinjon Resnick (Google Brain)
Adam Roberts (Google Brain)
Jesse Engel (Google Brain)
Douglas Eck (Google Brain)
Sander Dieleman (DeepMind)
Karen Simonyan (DeepMind)
Mohammad Norouzi (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Tue. Aug 8th 04:42 -- 05:00 AM Room Parkside 1
More from the Same Authors
-
2023 Poster: The Flan Collection: Designing Data and Methods for Effective Instruction Tuning »
Shayne Longpre · Le Hou · Tu Vu · Albert Webson · Hyung Won Chung · Yi Tay · Denny Zhou · Quoc Le · Barret Zoph · Jason Wei · Adam Roberts -
2023 Poster: Large Language Models Struggle to Learn Long-Tail Knowledge »
Nikhil Kandpal · Haikang Deng · Adam Roberts · Eric Wallace · Colin Raffel -
2022 Workshop: Machine Learning for Audio Synthesis »
Rachel Manzelli · Brian Kulis · Sadie Allen · Sander Dieleman · Yu Zhang -
2022 Poster: General-purpose, long-context autoregressive modeling with Perceiver AR »
Curtis Hawthorne · Drew Jaegle · Cătălina Cangea · Sebastian Borgeaud · Charlie Nash · Mateusz Malinowski · Sander Dieleman · Oriol Vinyals · Matthew Botvinick · Ian Simon · Hannah Sheahan · Neil Zeghidour · Jean-Baptiste Alayrac · Joao Carreira · Jesse Engel -
2022 Spotlight: General-purpose, long-context autoregressive modeling with Perceiver AR »
Curtis Hawthorne · Drew Jaegle · Cătălina Cangea · Sebastian Borgeaud · Charlie Nash · Mateusz Malinowski · Sander Dieleman · Oriol Vinyals · Matthew Botvinick · Ian Simon · Hannah Sheahan · Neil Zeghidour · Jean-Baptiste Alayrac · Joao Carreira · Jesse Engel -
2021 Poster: Generating images with sparse representations »
Charlie Nash · Jacob Menick · Sander Dieleman · Peter Battaglia -
2021 Poster: Emergent Social Learning via Multi-agent Reinforcement Learning »
Kamal Ndousse · Douglas Eck · Sergey Levine · Natasha Jaques -
2021 Poster: High-Performance Large-Scale Image Recognition Without Normalization »
Andy Brock · Soham De · Samuel Smith · Karen Simonyan -
2021 Spotlight: High-Performance Large-Scale Image Recognition Without Normalization »
Andy Brock · Soham De · Samuel Smith · Karen Simonyan -
2021 Oral: Generating images with sparse representations »
Charlie Nash · Jacob Menick · Sander Dieleman · Peter Battaglia -
2021 Spotlight: Emergent Social Learning via Multi-agent Reinforcement Learning »
Kamal Ndousse · Douglas Eck · Sergey Levine · Natasha Jaques -
2020 : Self-supervised Pitch Detection by Inverse Audio Synthesis »
Jesse Engel -
2020 Poster: Off-Policy Actor-Critic with Shared Experience Replay »
Simon Schmitt · Matteo Hessel · Karen Simonyan -
2020 Poster: Encoding Musical Style with Transformer Autoencoders »
Kristy Choi · Curtis Hawthorne · Ian Simon · Monica Dinculescu · Jesse Engel -
2020 Poster: Imputer: Sequence Modelling via Imputation and Dynamic Programming »
William Chan · Chitwan Saharia · Geoffrey Hinton · Mohammad Norouzi · Navdeep Jaitly -
2020 Poster: An Optimistic Perspective on Offline Deep Reinforcement Learning »
Rishabh Agarwal · Dale Schuurmans · Mohammad Norouzi -
2020 Poster: A Simple Framework for Contrastive Learning of Visual Representations »
Ting Chen · Simon Kornblith · Mohammad Norouzi · Geoffrey Hinton -
2019 Poster: Similarity of Neural Network Representations Revisited »
Simon Kornblith · Mohammad Norouzi · Honglak Lee · Geoffrey Hinton -
2019 Poster: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Oral: Similarity of Neural Network Representations Revisited »
Simon Kornblith · Mohammad Norouzi · Honglak Lee · Geoffrey Hinton -
2019 Oral: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Poster: Understanding the Impact of Entropy on Policy Optimization »
Zafarali Ahmed · Nicolas Le Roux · Mohammad Norouzi · Dale Schuurmans -
2019 Oral: Understanding the Impact of Entropy on Policy Optimization »
Zafarali Ahmed · Nicolas Le Roux · Mohammad Norouzi · Dale Schuurmans -
2019 Poster: Learning to Groove with Inverse Sequence Transformations »
Jon Gillick · Adam Roberts · Jesse Engel · Douglas Eck · David Bamman -
2019 Oral: Learning to Groove with Inverse Sequence Transformations »
Jon Gillick · Adam Roberts · Jesse Engel · Douglas Eck · David Bamman -
2018 Poster: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Parallel WaveNet: Fast High-Fidelity Speech Synthesis »
Aäron van den Oord · Yazhe Li · Igor Babuschkin · Karen Simonyan · Oriol Vinyals · Koray Kavukcuoglu · George van den Driessche · Edward Lockhart · Luis C Cobo · Florian Stimberg · Norman Casagrande · Dominik Grewe · Seb Noury · Sander Dieleman · Erich Elsen · Nal Kalchbrenner · Heiga Zen · Alex Graves · Helen King · Tom Walters · Dan Belov · Demis Hassabis -
2018 Poster: Efficient Neural Audio Synthesis »
Nal Kalchbrenner · Erich Elsen · Karen Simonyan · Seb Noury · Norman Casagrande · Edward Lockhart · Florian Stimberg · Aäron van den Oord · Sander Dieleman · Koray Kavukcuoglu -
2018 Oral: Parallel WaveNet: Fast High-Fidelity Speech Synthesis »
Aäron van den Oord · Yazhe Li · Igor Babuschkin · Karen Simonyan · Oriol Vinyals · Koray Kavukcuoglu · George van den Driessche · Edward Lockhart · Luis C Cobo · Florian Stimberg · Norman Casagrande · Dominik Grewe · Seb Noury · Sander Dieleman · Erich Elsen · Nal Kalchbrenner · Heiga Zen · Alex Graves · Helen King · Tom Walters · Dan Belov · Demis Hassabis -
2018 Oral: Efficient Neural Audio Synthesis »
Nal Kalchbrenner · Erich Elsen · Karen Simonyan · Seb Noury · Norman Casagrande · Edward Lockhart · Florian Stimberg · Aäron van den Oord · Sander Dieleman · Koray Kavukcuoglu -
2018 Oral: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music »
Adam Roberts · Jesse Engel · Colin Raffel · Curtis Hawthorne · Douglas Eck -
2018 Oral: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Oral: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music »
Adam Roberts · Jesse Engel · Colin Raffel · Curtis Hawthorne · Douglas Eck -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 : NSynth: Unsupervised Understanding of Musical Notes »
Cinjon Resnick -
2017 Poster: Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs »
Michael Gygli · Mohammad Norouzi · Anelia Angelova -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs »
Michael Gygli · Mohammad Norouzi · Anelia Angelova -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Poster: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Poster: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck -
2017 Talk: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Talk: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck -
2017 Poster: Video Pixel Networks »
Nal Kalchbrenner · Karen Simonyan · Aäron van den Oord · Ivo Danihelka · Oriol Vinyals · Alex Graves · Koray Kavukcuoglu -
2017 Talk: Video Pixel Networks »
Nal Kalchbrenner · Karen Simonyan · Aäron van den Oord · Ivo Danihelka · Oriol Vinyals · Alex Graves · Koray Kavukcuoglu