Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Andreas Triantafyllopoulos · Meishu Song · Zijiang Yang · Xin Jing · Björn Schuller


Abstract:

In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an "enrolment" encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of "soft" feature selection. The emotion and enrolment encoders are based on two standard audio architectures: CNN14 and CNN10. The two encoders are further guided to forget or learn auxiliary emotion and/or speaker information. Our best approach achieves a CCC of .650 on the ExVo Few-Shot dev set, a 2.5% increase over our baseline CNN14 CCC of .634.

Chat is not available.