Invited talk
in
Workshop: Machine Learning for Audio Synthesis
Frontiers and challenges in music audio generation
Chris Donahue
Despite notable recent progress on generative modeling of text, images, and speech, generative modeling of music audio remains a challenging frontier for machine learning. A primary obstacle of modeling audio is the extreme sequence lengths of audio waveforms, which are impractical to model directly with standard methods. A challenge more specific to modeling music audio is scaling to critical capacity, an elusive threshold of model size beyond which coherent generation emerges. In this talk, I will present strategies from my work which seek to overcome the practical challenges of modeling audio by either (1) exploring featurizations which reduce superfluous information in waveforms, or (2) proposing new methods which can process waveforms directly. I will also share insights from ongoing work on achieving critical capacity for generating broad music audio, i.e., music audio not constrained to a particular instrument or genre.