Workshop
Self-supervision in Audio and Speech
Mirco Ravanelli · Dmitriy Serdyuk · R Devon Hjelm · Bhuvana Ramabhadran · Titouan Parcollet
Fri 17 Jul, 12:05 a.m. PDT
Keywords: audio speech self-supervised learning
Even though supervised learning using large annotated corpora is still the dominant approach in machine learning, self-supervised learning is gaining considerable popularity. Applying self-supervised learning to audio and speech sequences, however, remains particularly challenging. Speech signals, in fact, are not only high-dimensional, long, and variable-length sequences, but also entail a complex hierarchical structure that is difficult to infer without supervision (e.g.phonemes, syllables, words). Moreover, speech is characterized by an important variability due to different speaker identities, accents, recording conditions and noises that highly increase the level of complexity.
We believe that self-supervised learning will play a crucial role in the future of artificial intelligence, and we think that great research effort is needed to efficiently take advantage of it in audio and speech applications. With our initiative, we wish to foster more progress in the field, and we hope to encourage a discussion amongst experts and practitioners from both academia and industry that might bring different points of view on this topic. Furthermore, we plan to extend the debate to multiple disciplines, encouraging discussions on how insights from other fields (e.g., computer vision and robotics) can be applied to speech, and how findings on speech can be used on other sequence processing tasks. The workshop will be conceived to promote communication and exchange of ideas between machine learning and speech communities. Throughout a series of invited talks, contributed presentations, poster sessions, as well as a panel discussion we want to foster a fruitful scientific discussion that cannot be done with that level of detail during the main ICML conference.