Automated Hierarchical Mixtures of Probabilistic Principal Component Analyzers
Ting Su - Northeastern University
Jennifer Dy - Northeastern University
Many clustering algorithms fail when dealing with high dimensional data. Principal component analysis (PCA) is a popular dimensionality reductionalgorithm. However, it assumes a single multivariate Gaussian model, whichprovides a global linear projection of the data. Mixture of probabilistic principal component analyzers (PPCA) provides abetter model to the clustering paradigm. It provides a local linear PCA projection for each multivariate Gaussiancluster component. We extend this model to build hierarchical mixtures ofPPCA. Hierarchical clustering provides a flexible representation showingrelationships among clusters in various perceptual levels. We introduce an automated hierarchical mixture of PPCA algorithm, whichutilizes the integrated classification likelihood as a criterion for splittingand stopping the addition of hierarchical levels. An automated approachrequires automated methods for initialization, determining the number ofprincipal component dimensions, and determining when to split clusters. We address each of these in the paper. This automated approach results in acoarse to fine local component model with varying projections and withdifferent number of dimensions for each cluster.