Active Learning Using Pre-clustering
Hieu Nguyen - University of Amsterdam
Arnold Smeulders - University of Amsterdam
The paper is concerned with two-class active learning. While the common approach for collecting data in active learning is to select samples close to the classification boundary, better performance can be achieved by taking into account the prior data distribution. The main contribution of the paper is a formal framework that incorporates clustering into active learning. The algorithm first constructs a classifier on the set of the cluster representatives, and then propagates the classificationdecision to the other samples via a local noise model. The proposed model allows to select the most representative samples as well as to avoid repeatedly labeling samples in the same cluster. During the active learning process, the clustering is adjusted using the coarse-to-fine strategy in order to balance between the advantage oflarge clusters and the accuracy of the data representation. The results of experiments in image databases show a better performance ofour algorithm compared to the current methods.