We present a deep generative model which explicitly models object occlusions for compositional scene representation. Latent representations of objects are disentangled into location, size, shape, and appearance, and the visual scene can be generated compositionally by integrating these representations and an infinite-dimensional binary vector indicating presences of objects in the scene. By training the model to learn spatial dependences of pixels in the unsupervised setting, the number of objects, pixel-level segregation of objects, and presences of objects in overlapping regions can be estimated through inference of latent variables. Extensive experiments conducted on a series of specially designed datasets demonstrate that the proposed method outperforms two state-of-the-art methods when object occlusions exist.
Jinyang Yuan (Fudan University)
Bin Li (Fudan University)
Xiangyang Xue (Fudan University)
Xiangyang Xue received the BS, MS, and PhD degrees in communication engineering from Xidian University, Xi’an, China, in 1989, 1992, and 1995, respectively. He is currently a professor of computer science with Fudan University,Shanghai, China. His research interests include multimedia information processing, computer vision and machine learning.
Related Events (a corresponding poster, oral, or spotlight)
2019 Oral: Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation »
Wed Jun 12th 03:10 -- 03:15 PM Room Seaside Ballroom