Timezone: »

Mitigating modality collapse in multimodal VAEs via impartial optimization
Adrián Javaloy · Maryam Meghdadi · Isabel Valera

@ None #None

A number of variational autoencoders (VAEs) have recently emerged with the aim of modeling multimodal data, e.g., to jointly model images and their corresponding captions. Still, multimodal VAEs tend to focus solely on a subset of the modalities, e.g., by fitting the image while neglecting the caption. We refer to this limitation as modality collapse. In this work, we argue that this effect is a consequence of conflicting gradients during multimodal VAE training. We show how to detect the sub-graphs in the computational graphs where gradients conflict (impartiality blocks), as well as how to leverage existing gradient-conflict solutions from multitask learning to mitigate modality collapse. That is, to ensure impartial optimization. We apply our training framework to several multimodal VAE models, losses and datasets from the literature, and empirically show that our framework significantly improves the reconstruction performance, conditional generation, and coherence of the latent space across modalities.

Author Information

Adrián Javaloy (Saarland University)
Maryam Meghdadi (Saarland University)
Isabel Valera (Saarland University)

Isabel Valera is a full Professor on Machine Learning at the Department of Computer Science of Saarland University in Saarbrücken (Germany), and Adjunct Faculty at MPI for Software Systems in Saarbrücken (Germany). She is also a scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). Prior to this, she was an independent group leader at the MPI for Intelligent Systems in Tübingen (Germany). She has held a German Humboldt Post-Doctoral Fellowship, and a “Minerva fast track” fellowship from the Max Planck Society. She obtained her PhD in 2014 and MSc degree in 2012 from the University Carlos III in Madrid (Spain), and worked as postdoctoral researcher at the MPI for Software Systems (Germany) and at the University of Cambridge (UK). Her research focuses on developing machine learning methods that are flexible, robust, interpretable and fair to analyze real-world data.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors