Skip to yearly menu bar Skip to main content

Workshop: Object-Oriented Learning: Perception, Representation, and Reasoning

Hierarchical Decomposition and Generation of Scenes with Compositional Objects

Fei Deng


Compositional structures between parts and objects are inherent in natural scenes. Recent work on representation learning has succeeded in modeling scenes as composition of objects, but further decomposition of objects into parts and subparts has largely been overlooked. In this paper, we propose RICH, the first deep latent variable model for learning Representation of Interpretable Compositional Hierarchies. At the core of RICH is a latent scene graph representation that organizes the entities of a scene into a tree according to their compositional relationships. During inference, RICH takes a top-down approach, allowing higher-level representation to guide lower-level decomposition in case there is compositional ambiguity. In experiments on images containing multiple compositional objects, we demonstrate that RICH is able to learn the latent compositional hierarchy, generate imaginary scenes, and improve data efficiency in downstream tasks.

Chat is not available.