Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Sampling and Optimization in Discrete Space

Tackling Provably Hard Representative Selection viaGraph Neural Networks

Mehran Kazemi · Anton Tsitsulin · Hossein Esfandiari · Mohammad Hossein Bateni · Deepak Ramachandran · Bryan Perozzi · Vahab Mirrokni


Abstract:

Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset.In this paper, we study RS for unlabeled datasets and focus on finding representatives that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result for RS by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model.We then study a setting where additional information in the form of a (homophilous) graph structure is available, or can be constructed, between the data points. We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RSGNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RSGNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RSGNN achieves significant improvements over established baselines on a suite of eight benchmarks.

Chat is not available.