Workshop: Subset Selection in Machine Learning: From Theory to Applications

A Data Subset Selection Framework for Efficient Hyper-Parameter Tuning and Automatic Machine Learning

Savan Amitbhai Visalpara · Krishnateja Killamsetty · Rishabh Iyer


In recent years, deep learning models have found great success in various tasks viz., object detection, speech recognition, and translation, making the everyday lives of people easier. Despite the success, training a deep learning model is often challenging as its performance depends mainly on the hyperparameters used. Moreover, finding the best hyperparameter configuration is often time-consuming, even when using state-of-the-art (SOTA) hyper-parameter optimization algorithms as they require multiple training runs over the entire dataset for different possible sets of hyperparameters. Our main insight is that using a subset of the dataset representing the entire dataset for model training runs involved in hyper-parameter optimization allows us to find the optimal hyperparameter configuration significantly faster. In this work, we explore using the data subsets selected using the existing supervised learning-based data subset selection methods, namely \textsc{Craig}, \textsc{Glister}, \textsc{Grad-Match}, for model training runs involved in hyper-parameter optimization. Further, we empirically demonstrate through several experiments on real-world datasets that using data subsets for hyper-parameter optimization achieves significantly faster turnaround times for hyper-parameter selection that achieves comparable performance to the hyper-parameters found using the entire dataset.