Invited Talk
in
Workshop: Subset Selection in Machine Learning: From Theory to Applications

Computationally Efficient Data Selection for Deep Learning

Cody Coleman

2021 Invited Talk
in
Workshop: Subset Selection in Machine Learning: From Theory to Applications

Abstract

Data selection methods, such as active learning and core-set selection, improve the data efficiency of machine learning by identifying the most informative data points to label or train on. Across the data selection literature, there are many ways to identify these training examples. However, classical data selection methods are prohibitively expensive to apply in deep learning because of the larger datasets and models. To make these methods tractable, we propose (1) “selection via proxy” (SVP) to avoid expensive training and reduce the computation per example and (2) “similarity search for efficient active learning and search” (SEALS) to reduce the number of examples processed. Both methods lead to order of magnitude performance improvements, making techniques like active learning on billions of unlabeled images practical for the first time.

Speaker

Cody Coleman

Cody recently completed a computer science PhD at Stanford University, advised by Professors Matei Zaharia and Peter Bailis. His research focuses on democratizing machine learning by reducing the cost of producing state-of-the-art models and creating novel abstractions that simplify machine learning development and deployment. His work spans from performance benchmarking of hardware and software systems (i.e., DAWNBench and MLPerf) to computationally efficient methods for active learning and core-set selection. He completed his B.S. and M.Eng. in electrical engineering and computer science at MIT.

Video

Chat is not available.