Algorithms and Hardness for Active Learning on Graphs
Abstract
We study the offline active learning problem on graphs. In this problem, one seeks to select k vertices whose labels are best suited for predicting the labels of all the other vertices in the graph.Guillory and Bilmes (Guillory & Bilmes, 2009) introduced a natural theoretical model motivated by a label smoothness assumption. Prior to our work, algorithms with theoretical guarantees were only known for restricted graph types such as trees (Cesa-Bianchi et al., 2010) despite the models simplicity. We present the first O(log n)-resource augmented algorithm for general weighted graphs. To complement our algorithm, we show constant hardness of approximation.
Lay Summary
In active learning, you get to choose which data points are labeled. A real world example are surveys, where respondents are usually carefully picked rather than selected arbitrarily. We give a new algorithm for selecting which data points to label based on similarity information between data points. For a natural objective, our algorithm always performs at least as well as the optimal algorithm with a more limited budget, and we observe that it outperforms previous algorithms in experiments.