Spotlight
in
Workshop: Object-Oriented Learning: Perception, Representation, and Reasoning
Hierarchical Decomposition and Generation of Scenes with Compositional Objects
Fei Deng
Compositional structures between parts and objects are inherent in natural scenes. Recent work on representation learning has succeeded in modeling scenes as composition of objects, but further decomposition of objects into parts and subparts has largely been overlooked. In this paper, we propose RICH, the first deep latent variable model for learning Representation of Interpretable Compositional Hierarchies. At the core of RICH is a latent scene graph representation that organizes the entities of a scene into a tree according to their compositional relationships. During inference, RICH takes a top-down approach, allowing higher-level representation to guide lower-level decomposition in case there is compositional ambiguity. In experiments on images containing multiple compositional objects, we demonstrate that RICH is able to learn the latent compositional hierarchy, generate imaginary scenes, and improve data efficiency in downstream tasks.