Geometric Multimodal Contrastive Representation Learning

Petra Poklukar · Miguel Vasco · Hang Yin · Francisco S. Melo · Ana Paiva · Danica Kragic

Hall E #431

Keywords: [ MISC: Unsupervised and Semi-supervised Learning ] [ MISC: Representation Learning ] [ MISC: Supervised Learning ] [ DL: Other Representation Learning ]

[ Abstract ]
[ Poster [ Paper PDF
Wed 20 Jul 3:30 p.m. PDT — 5:30 p.m. PDT
Spotlight presentation: Deep Learning
Wed 20 Jul 1:30 p.m. PDT — 3 p.m. PDT


Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Chat is not available.