Skip to yearly menu bar Skip to main content


Talk
in
Workshop: Self-supervision in Audio and Speech

Invited Talk: Unsupervised pre-training of bidirectional speech encoders via masked reconstruction

Karen Livescu


Abstract:

We propose an approach for pre-training speech representations via a masked reconstruction loss. Our pre-trained encoder networks are bidirectional and can therefore be used directly in typical bidirectional speech recognition models. The pre-trained networks can then be fine-tuned on a smaller amount of labelled data for speech recognition. In addition, we address the problem of domain differences between the pre-training and fine-tuning data, by adding an explicit adaptation layer during fine-tuning. Experiments with this approach on the LibriSpeech and Wall Street Journal corpora show promising results. The gain from pre-training is additive to that from supervised data augmentation.

Link to the video: https://slideslive.com/38930736/unsupervised-pretraining-of-bidirectional-speech-encoders-via-masked-reconstruction

Chat is not available.