Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022

Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

Josh Belanich · Krishna Somandepalli · Brian Eoff · Brendan Jou


Abstract:

This technical report presents the modeling approaches used in our submission to the ICML Expressive Vocalizations Workshop \& Competition multitask track (\textsc{ExVo-MultiTask}). We first applied image classification models of various sizes on mel-spectrogram representations of the vocal bursts, as is standard in sound event detection literature. Results from these models show an increase of 21.24\% over the baseline system with respect to the harmonic mean of the task metrics, and comprise our team's main submission to the \textsc{MultiTask} track. We then sought to characterize the headroom in the \textsc{MultiTask} track by applying a large pre-trained Conformer model that previously achieved state-of-the-art results on paralinguistic tasks like speech emotion recognition and mask detection. We additionally investigated the relationship between the sub-tasks of emotional expression, country of origin, and age prediction, and discovered that the best performing models are trained as single-task models, questioning whether the problem truly benefits from a multitask setting.

Chat is not available.