Timezone: »
We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither the matrix nor the codebook are updated during self-supervised learning. Since the random-projection quantizer is not trained and is separated from the speech recognition model, the design makes the approach flexible and is compatible with universal speech recognition architecture. On LibriSpeech our approach achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models, and provides lower word-error-rates than previous work with streaming models. On multilingual tasks the approach also provides significant improvement over wav2vec 2.0 and w2v-BERT.
Author Information
Chung-Cheng Chiu (Google Brain)
James Qin (Google)
Yu Zhang (Google)
Jiahui Yu (Google)
Yonghui Wu (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Self-supervised learning with random-projection quantizer for speech recognition »
Thu. Jul 21st 08:20 -- 08:25 PM Room Hall G
More from the Same Authors
-
2022 Poster: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts »
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui -
2022 Spotlight: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts »
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui -
2020 Expo Talk Panel: Baidu AutoDL: Automated and Interpretable Deep Learning »
Bolei Zhou · Yi Yang · Quanshi Zhang · Dejing Dou · Haoyi Xiong · Jiahui Yu · Humphrey Shi · Linchao Zhu · Xingjian Li -
2018 Poster: Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis »
Yuxuan Wang · Daisy Stanton · Yu Zhang · RJ-Skerry Ryan · Eric Battenberg · Joel Shor · Ying Xiao · Ye Jia · Fei Ren · Rif Saurous -
2018 Oral: Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis »
Yuxuan Wang · Daisy Stanton · Yu Zhang · RJ-Skerry Ryan · Eric Battenberg · Joel Shor · Ying Xiao · Ye Jia · Fei Ren · Rif Saurous