Skip to yearly menu bar Skip to main content

Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022

Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations

Chin-Cheng Hsu


We formulated non-speech vocalization (NSV) modeling as a text-to-speech (TTS) task and verified its viability. Specifically, we evaluated the phonetic expressivity of Hubert speech units on NSVs and verified our model’s ability to generalize to few-shot speakers. In addition, we explicated one of the major challenges in the ExVo dataset by visualizing the speaker space our model learned and discussed possible improvements for future research. Audio samples of synthesized NSVs can be found on our anonymized demo page.

Chat is not available.