Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022

Self-supervision and Learnable STRFs for Age, Emotion and Country Prediction

Roshan Sharma · Tyler Vuong · Mark Lindsey · Hira Dhamyal · Bhiksha Raj · Rita Singh


Abstract:

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge. The method of choice utilized a combination of spectro-temporal modulation and self-supervised features, followed by an encoder-decoder network organized in a multitask paradigm. We evaluate the complementarity between the tasks posed by examining independent task-specific and joint models, and explore the relative strengths of different feature sets. We also introduce a simple score fusion mechanism to leverage the complementarity of different feature sets for this task. We find that robust data pre-processing in conjunction with score fusion over spectro-temporal receptive field and HUBERT models achieved our best test score of 41.2.

Chat is not available.