Contributed talk
in
Workshop: Machine Learning for Audio Synthesis
Generating Detailed Music Datasets with Neural Audio Synthesis
Yusong Wu
Generative models are increasingly able to generate realistic, high-quality data in the domain of both symbolic music (i.e. MIDI) and raw audio. These models have also been trained in ways that are increasingly controllable, allowing for deliberate and systematic manipulation of outputs to have desired characteristics. However, despite the promising demonstrated benefits of using synthetic data to improve low-resource learning in other domains, research has not yet leveraged generative models to create large-scale datasets suitable for modern deep learning models in the music domain. In this work, we address this gap by using a generative model of MIDI (Coconet trained on Bach Chorales) with a structured audio synthesis model (MIDI-DDSP trained on URMP). We demonstrate a system capable of producing unlimited amounts of realistic chorale music with rich annotations through controlled synthesis of MIDI through generative models. We call this system the Chamber Ensemble Generator (CEG), and use it to generate a large dataset of chorales (CocoChorales). We demonstrate that data generated using our approach improves state-of-the-art models for music transcription and source separation, and we release both the system and the dataset as an open-source foundation for future work.