Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Marco Jiralerspong · Gauthier Gidel
2022 Spotlight
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022
Abstract
We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition (Baird et al., 2022). We train a conditional StyleGAN2 (Karras et al., 2019) architecture on mel-spectrograms of preprocessed versions of the audio samples. The mel-spectrograms generated by the model are then inverted back to the audio domain using Griffin-Lim. As a result, our generated samples significantly improve upon the baseline provided by (Baird et al., 2022) from a qualitative and quantitative perspective. More precisely, on all emotions, we improve the FAD of the baseline by a significant factor ranging from1.97 (Awe) to 3.9 (Sadness).
Video
Chat is not available.
Successful Page Load