Poster
in
Workshop: The Second Workshop on Spurious Correlations, Invariance and Stability
Separating multimodal modeling from multidimensional modeling for multimodal learning
Divyam Madaan · Taro Makino · Sumit Chopra · Kyunghyun Cho
Multimodal learning is defined as learning to map a set of separate modalities to a target. Despite its intuitive definition, it is unclear whether one should model this problem using a multidimensional model, where the features from each modality are concatenated and treated as multidimensional features from a single modality or a multimodal model, where we use the information about the modality boundaries. In this first-of-its-kind work we formalize the framework for multimodal learning and identify the conditions that favor multimodal modeling over multidimensional modeling. Through a series of synthetic experiments, where we fully control the data generation process, we demonstrate the necessity of multimodal modeling for solving a multimodal learning problem for the first time. Our proposed framework, which is agnostic to any assumptions pertaining to model architectures, can have a widespread impact by informing modeling choices when dealing with data from different modalities.