Timezone: »
Despite the progress in voice conversion, many-to-many voice conversion trained on non-parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style transfer algorithms, generative adversarial networks (GAN) in particular, are being applied as new solutions in this field. However, GAN training is very sophisticated and difficult, and there is no strong evidence that its generated speech is of good perceptual quality. In this paper, we propose a new style transfer scheme that involves only an autoencoder with a carefully designed bottleneck. We formally show that this scheme can achieve distribution-matching style transfer by training only on self-reconstruction loss. Based on this scheme, we proposed AutoVC, which achieves state-of-the-art results in many-to-many voice conversion with non-parallel data, and which is the first to perform zero-shot voice conversion.
Author Information
Kaizhi Qian (UIUC)
Yang Zhang (IBM-MIT Research Lab)
Shiyu Chang (MIT-IBM Watson AI Lab)
Xuesong Yang (University of Illinois at Urbana-Champaign)
Mark Hasegawa-Johnson (University of Illinois)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss »
Thu. Jun 13th 05:10 -- 05:15 PM Room Room 201
More from the Same Authors
-
2023 Poster: Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models »
Guanhua Zhang · Jiabao Ji · Yang Zhang · Mo Yu · Tommi Jaakkola · Shiyu Chang -
2023 Poster: Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning »
Zhongzhi Yu · Yang Zhang · Kaizhi Qian · Cheng Wan · Yonggan Fu · Yongan Zhang · Yingyan (Celine) Lin -
2023 Poster: PromptBoosting: Black-Box Text Classification with Ten Forward Passes »
Bairu Hou · Joe O'Connor · Jacob Andreas · Shiyu Chang · Yang Zhang -
2022 Poster: Data-Efficient Double-Win Lottery Tickets from Robust Pre-training »
Tianlong Chen · Zhenyu Zhang · Sijia Liu · Yang Zhang · Shiyu Chang · Zhangyang “Atlas” Wang -
2022 Poster: ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers »
Kaizhi Qian · Yang Zhang · Heting Gao · Junrui Ni · Cheng-I Lai · David Cox · Mark Hasegawa-Johnson · Shiyu Chang -
2022 Spotlight: Data-Efficient Double-Win Lottery Tickets from Robust Pre-training »
Tianlong Chen · Zhenyu Zhang · Sijia Liu · Yang Zhang · Shiyu Chang · Zhangyang “Atlas” Wang -
2022 Spotlight: ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers »
Kaizhi Qian · Yang Zhang · Heting Gao · Junrui Ni · Cheng-I Lai · David Cox · Mark Hasegawa-Johnson · Shiyu Chang -
2022 Poster: Forget-free Continual Learning with Winning Subnetworks »
Haeyong Kang · Rusty Mina · Sultan Rizky Hikmawan Madjid · Jaehong Yoon · Mark Hasegawa-Johnson · Sung Ju Hwang · Chang Yoo -
2022 Spotlight: Forget-free Continual Learning with Winning Subnetworks »
Haeyong Kang · Rusty Mina · Sultan Rizky Hikmawan Madjid · Jaehong Yoon · Mark Hasegawa-Johnson · Sung Ju Hwang · Chang Yoo -
2021 Poster: Global Prosody Style Transfer Without Text Transcriptions »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Jinjun Xiong · Chuang Gan · David Cox · Mark Hasegawa-Johnson -
2021 Oral: Global Prosody Style Transfer Without Text Transcriptions »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Jinjun Xiong · Chuang Gan · David Cox · Mark Hasegawa-Johnson -
2021 Poster: Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators »
Yonggan Fu · Yongan Zhang · Yang Zhang · David Cox · Yingyan Lin -
2021 Spotlight: Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators »
Yonggan Fu · Yongan Zhang · Yang Zhang · David Cox · Yingyan Lin -
2020 Poster: Invariant Rationalization »
Shiyu Chang · Yang Zhang · Mo Yu · Tommi Jaakkola -
2020 Poster: Proper Network Interpretability Helps Adversarial Robustness in Classification »
Akhilan Boopathy · Sijia Liu · Gaoyuan Zhang · Cynthia Liu · Pin-Yu Chen · Shiyu Chang · Luca Daniel -
2020 Poster: Unsupervised Speech Decomposition via Triple Information Bottleneck »
Kaizhi Qian · Yang Zhang · Shiyu Chang · Mark Hasegawa-Johnson · David Cox