Faster Attend-Infer-Repeat with Tractable Probabilistic Models
Karl Stelzner · Robert Peharz · Kristian Kersting

Thu Jun 13th 05:10 -- 05:15 PM @ Grand Ballroom

The recent attend-infer-repeat (AIR) framework marks a milestone in Bayesian scene understanding and in the promising avenue of structured probabilistic modeling. The AIR model expresses the composition of visual scenes from individual objects, and uses variational autoencoders to model the appearance of those objects. However, inference in the overall model is highly intractable, which hampers its learning speed and makes it prone to sub-optimal solutions. In this paper, we show that inference and learning in AIR can be considerably accelerated by replacing the intractable object representations with tractable probabilistic models. In particular, we opt for sum-product (SP) networks, an expressive deep probabilistic model with a rich set of tractable inference routines. As our empirical evidence shows, the resulting model, called SPAIR, achieves a higher object detection accuracy than the original AIR system, while reducing the learning time by an order of magnitude. Moreover, SPAIR allows one to treat object occlusions in a consistent manner and to include a background noise model, improving the robustness of Bayesian scene understanding.

Author Information

Karl Stelzner (TU Darmstadt)
Robert Peharz (University of Cambridge)
Kristian Kersting (TU Darmstadt)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors