Learning to Cluster using Local Neighborhood Structure
Romer Rosales - University of Toronto / MIT
Kannan Achan - University of Toronto
Brendan Frey - University of Toronto
This paper introduces an approach for clustering/classification which is basedon the use of local, high-order structure present in the data. For someproblems, this local structure might be more relevant for classification thanother measures of point similarity used by popular unsupervised andsemi-supervised clustering methods. Under this approach, changes in the classlabel are associated to changes in the local properties of the data. Usingthis idea, we also pursue to learn how to cluster given examples of clustereddata including from different datasets. We make these concepts formal bypresenting a probability model that captures their fundamentals and show thatin this setting, learning to cluster is a well defined and tractable task.Based on probabilistic inference methods, we then present an algorithm forcomputing the posterior probability distribution of class labels for each datapoint. Experiments in the domain of spatial grouping and functional geneclassification are used to illustrate and test these concepts.