ICML Discuss
Infinite-Word Topic Models
by Austin Waters at ICML 2012
We present the Infinite-Word Topic Model (IWTM), a non-parametric extension of Latent Dirichlet Allocation (LDA) for modeling collections of images and other non-text documents. Whereas LDA requires that document features be preprocessed into a bag of words representation with fixed vocabulary, IWTM incorporates feature clustering inside the probabilistic model, and treats the vocabulary size as a random quantity to be inferred. By making use of the Hierarchical Dirichlet Process, IWTM defines a topic model over an a priori infinite set of 'words'. We derive a collapsed Gibbs sampler for the model and present competitive results on two image classification tasks.

Related Material

(No PDF available)

Discussion

Email notifications of comments are sent to authors.
Please use the feedback page to report broken links and other problems.
blog comments powered by Disqus