Timezone: »
An Extreme Point Approach to Subset Selection
Bill Kay · Viveck Cadambe
Sat Jul 24 03:40 PM -- 03:45 PM (PDT) @
Given a training data set, we develop a method to obtain a representative subset of training data meant for classification. Our focus is on subsets that are particularly suitable for training linear classification models. Our method is to reduce data sets in a manner that essentially preserves the convex hulls of the various classes of the data set. We develop scalable randomized algorithms for convex hull approximation, and obtain rigorous guarantees for models trained on subsets of linearly separable data sets. We empirically demonstrate the effectiveness of our approach for both training and data augmentation on the MNIST data set.
Author Information
Bill Kay (Oak Ridge National Laboratory)
Viveck Cadambe (The Pennsylvania State University)
More from the Same Authors
-
2021 : An Extreme Point Approach to Subset Selection »
Viveck Cadambe · Bill Kay -
2019 Workshop: Coding Theory For Large-scale Machine Learning »
Viveck Cadambe · Pulkit Grover · Dimitris Papailiopoulos · Gauri Joshi -
2019 Poster: Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization »
Farzin Haddadpour · Mohammad Mahdi Kamani · Mehrdad Mahdavi · Viveck Cadambe -
2019 Oral: Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization »
Farzin Haddadpour · Mohammad Mahdi Kamani · Mehrdad Mahdavi · Viveck Cadambe