Spotlight
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Marco Jiralerspong · Gauthier Gidel
Abstract:
We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition (Baird et al., 2022). We train a conditional StyleGAN2 (Karras et al., 2019) architecture on mel-spectrograms of preprocessed versions of the audio samples. The mel-spectrograms generated by the model are then inverted back to the audio domain using Griffin-Lim. As a result, our generated samples significantly improve upon the baseline provided by (Baird et al., 2022) from a qualitative and quantitative perspective. More precisely, on all emotions, we improve the FAD of the baseline by a significant factor ranging from1.97 (Awe) to 3.9 (Sadness).
Chat is not available.