Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Sampling and Optimization in Discrete Space

Sequential Attention for Feature Selection

Taisuke Yasuda · Mohammad Hossein Bateni · Lin Chen · Matthew Fahrbach · Thomas Fu · Vahab Mirrokni


Abstract: Feature selection is the problem of selecting a subset of features for a machine learning modelthat maximizes model quality subject to a budget constraint.For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques,typically select the entire feature subset in one evaluation round,ignoring the residual value of features during selection,i.e., the marginal contribution of a feature given that other features have already been selected.We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical resultsfor neural networks.This algorithm is based on an efficient one-pass implementation of greedy forward selectionand uses attention weights at each step as a proxy for feature importance.We give theoretical insights into our algorithm for linear regressionby showing that an adaptation to this setting is equivalent to theclassical Orthogonal Matching Pursuit (OMP) algorithm,and thus inherits all of its provable guarantees.Our theoretical and empirical analyses offer new explanations towards the effectiveness of attentionand its connections to overparameterization, which may be of independent interest.

Chat is not available.