Skip to yearly menu bar Skip to main content

Workshop: Subset Selection in Machine Learning: From Theory to Applications

Bayesian decision analysis for collecting nearly-optimal subsets

Daniel Kowal


Subset selection is valuable for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often eschewed due to instability and computational bottlenecks. We address these challenges using Bayesian decision analysis. For any Bayesian predictive model, we elicit an acceptable family of subsets that provide nearly-optimal linear prediction. The Bayesian model regularizes the linear coefficients and propagates uncertainty quantification for key predictive comparisons. The acceptable family spawns new variable importance metrics based on whether a variable appear in all, some, or no acceptable subsets. The proposed approach exhibits excellent prediction and stability for both simulated data (including p = 400 > n) and a large education dataset with highly correlated covariates.