Timezone: »
We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task.
Author Information
RJ Skerry-Ryan (Google, Inc.)
Eric Battenberg
Ying Xiao (Google Inc)
Yuxuan Wang (Google)
Daisy Stanton
Joel Shor (Google)
Ron Weiss (Google Brain)
Robert Clark (Google UK)
Rif Saurous
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron »
Thu. Jul 12th 02:00 -- 02:20 PM Room A3
More from the Same Authors
-
2023 : Generative semi-supervised learning with a neural seq2seq noisy channel »
Soroosh Mariooryad · Matt Shannon · Siyuan Ma · Tom Bagby · David Kao · Daisy Stanton · Eric Battenberg · RJ Skerry-Ryan -
2023 Poster: Sequential Monte Carlo Learning for Time Series Structure Discovery »
Feras Saad · Brian Patton · Matthew Hoffman · Rif Saurous · Vikash Mansinghka -
2019 Poster: CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network »
Tom Kenter · Vincent Wan · Chun-an Chan · Robert Clark · Jakub Vit -
2019 Oral: CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network »
Tom Kenter · Vincent Wan · Chun-an Chan · Robert Clark · Jakub Vit -
2019 Poster: An Investigation into Neural Net Optimization via Hessian Eigenvalue Density »
Behrooz Ghorbani · Shankar Krishnan · Ying Xiao -
2019 Oral: An Investigation into Neural Net Optimization via Hessian Eigenvalue Density »
Behrooz Ghorbani · Shankar Krishnan · Ying Xiao -
2018 Poster: Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis »
Yuxuan Wang · Daisy Stanton · Yu Zhang · RJ-Skerry Ryan · Eric Battenberg · Joel Shor · Ying Xiao · Ye Jia · Fei Ren · Rif Saurous -
2018 Oral: Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis »
Yuxuan Wang · Daisy Stanton · Yu Zhang · RJ-Skerry Ryan · Eric Battenberg · Joel Shor · Ying Xiao · Ye Jia · Fei Ren · Rif Saurous -
2018 Poster: Fixing a Broken ELBO »
Alexander Alemi · Ben Poole · Ian Fischer · Joshua V Dillon · Rif Saurous · Kevin Murphy -
2018 Oral: Fixing a Broken ELBO »
Alexander Alemi · Ben Poole · Ian Fischer · Joshua V Dillon · Rif Saurous · Kevin Murphy -
2017 Poster: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck -
2017 Talk: Online and Linear-Time Attention by Enforcing Monotonic Alignments »
Colin Raffel · Thang Luong · Peter Liu · Ron Weiss · Douglas Eck