Integrating Constraints and Metric Learning in Semi-Supervised Clustering
Mikhail Bilenko - Department of Computer Sciences, University of Texas at Austin
Sugato Basu - Department of Computer Sciences, University of Texas at Austin
Raymond J. Mooney - Department of Computer Sciences, University of Texas at Austin
Semi-supervised clustering employs a small amount of labeled data to aidunsupervised learning. Previous work in the area has utilized supervised datain one of two approaches: 1) constraint-based methods that guide theclustering algorithm towards a better grouping of the data, and 2)distance-function learning methods that adapt the underlying similarity metricused by the clustering algorithm. This paper provides new methods for the twoapproaches as well as presents a new semi-supervised clustering algorithm thatintegrates both of these techniques in a uniform, principled framework.Experimental results demonstrate that the unified approach produces betterclusters than both individual approaches as well as previously proposedsemi-supervised clustering algorithms.