Timezone: »
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.
Author Information
Kaizhi Qian (MIT-IBM Watson AI Lab)
Yang Zhang (MIT-IBM Watson AI Lab)
Heting Gao (University of Illinois at Urbana-Champaign)
Junrui Ni (University Of Illinois at Urbana-Champaign)
Cheng-I Lai (MIT)
David Cox (MIT-IBM Watson AI Lab)
Mark Hasegawa-Johnson (University of Illinois)
Shiyu Chang (UCSB)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers »
Thu. Jul 21st through Fri the 22nd Room Hall E #126
More from the Same Authors
-
2023 Poster: Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models »
Guanhua Zhang · Jiabao Ji · Yang Zhang · Mo Yu · Tommi Jaakkola · Shiyu Chang -
2023 Poster: Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning »
Zhongzhi Yu · Yang Zhang · Kaizhi Qian · Cheng Wan · Yonggan Fu · Yongan Zhang · Yingyan (Celine) Lin -
2023 Poster: PromptBoosting: Black-Box Text Classification with Ten Forward Passes »
Bairu Hou · Joe O'Connor · Jacob Andreas · Shiyu Chang · Yang Zhang -
2022 Poster: Learning Stable Classifiers by Transferring Unstable Features »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2022 Poster: Data-Efficient Double-Win Lottery Tickets from Robust Pre-training »
Tianlong Chen · Zhenyu Zhang · Sijia Liu · Yang Zhang · Shiyu Chang · Zhangyang “Atlas” Wang -
2022 Poster: Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness »
Tianlong Chen · Huan Zhang · Zhenyu Zhang · Shiyu Chang · Sijia Liu · Pin-Yu Chen · Zhangyang “Atlas” Wang -
2022 Spotlight: Data-Efficient Double-Win Lottery Tickets from Robust Pre-training »
Tianlong Chen · Zhenyu Zhang · Sijia Liu · Yang Zhang · Shiyu Chang · Zhangyang “Atlas” Wang -
2022 Spotlight: Learning Stable Classifiers by Transferring Unstable Features »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2022 Spotlight: Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness »
Tianlong Chen · Huan Zhang · Zhenyu Zhang · Shiyu Chang · Sijia Liu · Pin-Yu Chen · Zhangyang “Atlas” Wang -
2022 Poster: Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization »
Yihua Zhang · Guanhua Zhang · Prashant Khanduri · Mingyi Hong · Shiyu Chang · Sijia Liu -
2022 Spotlight: Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization »
Yihua Zhang · Guanhua Zhang · Prashant Khanduri · Mingyi Hong · Shiyu Chang · Sijia Liu -
2022 Poster: Forget-free Continual Learning with Winning Subnetworks »
Haeyong Kang · Rusty Mina · Sultan Rizky Hikmawan Madjid · Jaehong Yoon · Mark Hasegawa-Johnson · Sung Ju Hwang · Chang Yoo -
2022 Spotlight: Forget-free Continual Learning with Winning Subnetworks »
Haeyong Kang · Rusty Mina · Sultan Rizky Hikmawan Madjid · Jaehong Yoon · Mark Hasegawa-Johnson · Sung Ju Hwang · Chang Yoo -
2021 Poster: Global Prosody Style Transfer Without Text Transcriptions »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Jinjun Xiong · Chuang Gan · David Cox · Mark Hasegawa-Johnson -
2021 Oral: Global Prosody Style Transfer Without Text Transcriptions »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Jinjun Xiong · Chuang Gan · David Cox · Mark Hasegawa-Johnson -
2021 Poster: Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2021 Spotlight: Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2021 Poster: Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators »
Yonggan Fu · Yongan Zhang · Yang Zhang · David Cox · Yingyan Lin -
2021 Spotlight: Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators »
Yonggan Fu · Yongan Zhang · Yang Zhang · David Cox · Yingyan Lin -
2020 Poster: Invariant Rationalization »
Shiyu Chang · Yang Zhang · Mo Yu · Tommi Jaakkola -
2020 Poster: Proper Network Interpretability Helps Adversarial Robustness in Classification »
Akhilan Boopathy · Sijia Liu · Gaoyuan Zhang · Cynthia Liu · Pin-Yu Chen · Shiyu Chang · Luca Daniel -
2020 Poster: Unsupervised Speech Decomposition via Triple Information Bottleneck »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Mark Hasegawa-Johnson · David Cox -
2019 Poster: AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Xuesong Yang · Mark Hasegawa-Johnson -
2019 Oral: AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Xuesong Yang · Mark Hasegawa-Johnson