Timezone: »
In this work, we propose global style tokens'' (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable
labels'' they generate can be used to control synthesis in novel ways, such as varying speed and speaking style -- independently of the text content. They can also be used for style transfer, replicating the speaking style of a single audio clip across an entire long-form text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis.
Author Information
Yuxuan Wang (Google)
Daisy Stanton
Yu Zhang (Google)
RJ-Skerry Ryan
Eric Battenberg
Joel Shor (Google)
Ying Xiao (Google Inc)
Ye Jia (Google)
Fei Ren (Google)
Rif Saurous
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #44
More from the Same Authors
-
2023 : Generative semi-supervised learning with a neural seq2seq noisy channel »
Soroosh Mariooryad · Matt Shannon · Siyuan Ma · Tom Bagby · David Kao · Daisy Stanton · Eric Battenberg · RJ Skerry-Ryan -
2023 Poster: Sequential Monte Carlo Learning for Time Series Structure Discovery »
Feras Saad · Brian Patton · Matthew Hoffman · Rif Saurous · Vikash Mansinghka -
2022 Poster: Self-supervised learning with random-projection quantizer for speech recognition »
Chung-Cheng Chiu · James Qin · Yu Zhang · Jiahui Yu · Yonghui Wu -
2022 Spotlight: Self-supervised learning with random-projection quantizer for speech recognition »
Chung-Cheng Chiu · James Qin · Yu Zhang · Jiahui Yu · Yonghui Wu -
2019 Poster: An Investigation into Neural Net Optimization via Hessian Eigenvalue Density »
Behrooz Ghorbani · Shankar Krishnan · Ying Xiao -
2019 Oral: An Investigation into Neural Net Optimization via Hessian Eigenvalue Density »
Behrooz Ghorbani · Shankar Krishnan · Ying Xiao -
2018 Poster: Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron »
RJ Skerry-Ryan · Eric Battenberg · Ying Xiao · Yuxuan Wang · Daisy Stanton · Joel Shor · Ron Weiss · Robert Clark · Rif Saurous -
2018 Oral: Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron »
RJ Skerry-Ryan · Eric Battenberg · Ying Xiao · Yuxuan Wang · Daisy Stanton · Joel Shor · Ron Weiss · Robert Clark · Rif Saurous -
2018 Poster: Fixing a Broken ELBO »
Alexander Alemi · Ben Poole · Ian Fischer · Joshua V Dillon · Rif Saurous · Kevin Murphy -
2018 Oral: Fixing a Broken ELBO »
Alexander Alemi · Ben Poole · Ian Fischer · Joshua V Dillon · Rif Saurous · Kevin Murphy