Poster
in
Workshop: Structured Probabilistic Inference and Generative Modeling
Regression-Stratified Sampling for Optimized Algorithm Selection in Time-Constrained Tabular AutoML
Mehdi Bahrami · So Hasegawa · Lei Liu · Wei-Peng Chen
Keywords: [ Tabular AutoML ] [ Regression-Stratified Sampling ] [ Time-Constrained AutoML ] [ Probability Density Function (PDF) ] [ Algorithm Selection ]
The selection of a machine-learning (ML) algorithm is indispensable for tabular AutoML training. Finding an optimized algorithm from a search space can be expensive for large tabular datasets, especially under time constraints. In this study, we introduce a novel Regression-Stratified Sampling approach that optimizes algorithm selection by minimizing distribution distance between a subset of data and the target variable(s) in the full-scale dataset via Probability Density Function (PDF). Additionally, we introduce a PDF Energy metric, based on relative entropy, to identify an optimized ML algorithm from the search space.Our comprehensive evaluation results demonstrate that the proposed approach successfully selects optimized algorithms from a search space of atomic and ensemble models, outperforming simple random sampling methods. We also conduct a thorough evaluation against Kullback-Leibler (KL) divergence, where the PDF Energy metric proves superior in algorithm selection.Furthermore, we validate our approach for ML algorithm selection in an end-to-end scenario across 31 public datasets using 6 tabular AutoML tools. The empirical results indicate that our proposed method efficiently utilizes Regression-Stratified Sampling and reliably identifies an optimized machine learning algorithm for tabular data through the PDF Energy metric under time constraints.