We present the Infinite-Word Topic Model (IWTM), a non-parametric extension of Latent Dirichlet Allocation (LDA) for modeling collections of images and other non-text documents. Whereas LDA requires that document features be preprocessed into a bag of words representation with fixed vocabulary, IWTM incorporates feature clustering inside the probabilistic model, and treats the vocabulary size as a random quantity to be inferred. By making use of the Hierarchical Dirichlet Process, IWTM defines a topic model over an a priori infinite set of 'words'. We derive a collapsed Gibbs sampler for the model and present competitive results on two image classification tasks.
(No PDF available)
Email notifications of comments are sent to authors.
Please use the
to report broken links and other problems.