Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022

Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

Marco Jiralerspong · Gauthier Gidel


Abstract:

We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition (Baird et al., 2022). We train a conditional StyleGAN2 (Karras et al., 2019) architecture on mel-spectrograms of preprocessed versions of the audio samples. The mel-spectrograms generated by the model are then inverted back to the audio domain using Griffin-Lim. As a result, our generated samples significantly improve upon the baseline provided by (Baird et al., 2022) from a qualitative and quantitative perspective. More precisely, on all emotions, we improve the FAD of the baseline by a significant factor ranging from1.97 (Awe) to 3.9 (Sadness).

Chat is not available.