Timezone: »
Sequential Attention for Feature Selection
Taisuke Yasuda · Mohammad Hossein Bateni · Lin Chen · Matthew Fahrbach · Thomas Fu · Vahab Mirrokni
Event URL: https://openreview.net/forum?id=Pn7CPyHmrr »
Feature selection is the problem of selecting a subset of features for a machine learning modelthat maximizes model quality subject to a budget constraint.For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques,typically select the entire feature subset in one evaluation round,ignoring the residual value of features during selection,i.e., the marginal contribution of a feature given that other features have already been selected.We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical resultsfor neural networks.This algorithm is based on an efficient one-pass implementation of greedy forward selectionand uses attention weights at each step as a proxy for feature importance.We give theoretical insights into our algorithm for linear regressionby showing that an adaptation to this setting is equivalent to theclassical Orthogonal Matching Pursuit (OMP) algorithm,and thus inherits all of its provable guarantees.Our theoretical and empirical analyses offer new explanations towards the effectiveness of attentionand its connections to overparameterization, which may be of independent interest.
Feature selection is the problem of selecting a subset of features for a machine learning modelthat maximizes model quality subject to a budget constraint.For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques,typically select the entire feature subset in one evaluation round,ignoring the residual value of features during selection,i.e., the marginal contribution of a feature given that other features have already been selected.We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical resultsfor neural networks.This algorithm is based on an efficient one-pass implementation of greedy forward selectionand uses attention weights at each step as a proxy for feature importance.We give theoretical insights into our algorithm for linear regressionby showing that an adaptation to this setting is equivalent to theclassical Orthogonal Matching Pursuit (OMP) algorithm,and thus inherits all of its provable guarantees.Our theoretical and empirical analyses offer new explanations towards the effectiveness of attentionand its connections to overparameterization, which may be of independent interest.
Author Information
Taisuke Yasuda (School of Computer Science, Carnegie Mellon University)
Mohammad Hossein Bateni (Google Research)
Lin Chen (Yale University)
Matthew Fahrbach (Google Research)
Thomas Fu (Google Research)
Vahab Mirrokni (Google Research)
More from the Same Authors
-
2021 : Label differential privacy via clustering »
Hossein Esfandiari · Vahab Mirrokni · Umar Syed · Sergei Vassilvitskii -
2023 : Preference Elicitation for Music Recommendations »
Ofer Meshi · Jon Feldman · Li Yang · Ben Scheetz · Yanli Cai · Mohammad Hossein Bateni · Corbyn Salisbury · Vikram Aggarwal · Craig Boutilier -
2023 : Tackling Provably Hard Representative Selection viaGraph Neural Networks »
Mehran Kazemi · Anton Tsitsulin · Hossein Esfandiari · Mohammad Hossein Bateni · Deepak Ramachandran · Bryan Perozzi · Vahab Mirrokni -
2023 : Tensor Proxies for Efficient Feature Cross Search »
Taisuke Yasuda · Mohammad Hossein Bateni · Lin Chen · Matthew Fahrbach · Thomas Fu -
2023 : k-Means Clustering with Distance-Based Privacy »
Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2023 Oral: Robust Budget Pacing with a Single Sample »
Santiago Balseiro · Rachitesh Kumar · Vahab Mirrokni · Balasubramanian Sivan · Di Wang -
2023 Poster: Robust Budget Pacing with a Single Sample »
Santiago Balseiro · Rachitesh Kumar · Vahab Mirrokni · Balasubramanian Sivan · Di Wang -
2023 Poster: Sharper Bounds for $\ell_p$ Sensitivity Sampling »
David Woodruff · Taisuke Yasuda -
2023 Poster: Robust and private stochastic linear bandits »
Vasilis Charisopoulos · Hossein Esfandiari · Vahab Mirrokni -
2023 Poster: Learning Rate Schedules in the Presence of Distribution Shift »
Matthew Fahrbach · Adel Javanmard · Vahab Mirrokni · Pratik Worah -
2023 Oral: Differentially Private Hierarchical Clustering with Provable Approximation Guarantees »
Jacob Imola · Alessandro Epasto · Mohammad Mahdian · Vincent Cohen-Addad · Vahab Mirrokni -
2023 Poster: Differentially Private Hierarchical Clustering with Provable Approximation Guarantees »
Jacob Imola · Alessandro Epasto · Mohammad Mahdian · Vincent Cohen-Addad · Vahab Mirrokni -
2023 Oral: Sharper Bounds for $\ell_p$ Sensitivity Sampling »
David Woodruff · Taisuke Yasuda -
2023 Poster: Multi-channel Autobidding with Budget and ROI Constraints »
Yuan Deng · Negin Golrezaei · Patrick Jaillet · Jason Cheuk Nam Liang · Vahab Mirrokni -
2023 Poster: Approximately Optimal Core Shapes for Tensor Decompositions »
Mehrdad Ghadiri · Matthew Fahrbach · Thomas Fu · Vahab Mirrokni -
2020 Poster: More Data Can Expand The Generalization Gap Between Adversarially Robust and Standard Models »
Lin Chen · Yifei Min · Mingrui Zhang · Amin Karbasi -
2019 Poster: Categorical Feature Compression via Submodular Optimization »
Mohammad Hossein Bateni · Lin Chen · Hossein Esfandiari · Thomas Fu · Vahab Mirrokni · Afshin Rostamizadeh -
2019 Oral: Categorical Feature Compression via Submodular Optimization »
Mohammad Hossein Bateni · Lin Chen · Hossein Esfandiari · Thomas Fu · Vahab Mirrokni · Afshin Rostamizadeh -
2019 Poster: Distributed Weighted Matching via Randomized Composable Coresets »
Sepehr Assadi · Mohammad Hossein Bateni · Vahab Mirrokni -
2019 Oral: Distributed Weighted Matching via Randomized Composable Coresets »
Sepehr Assadi · Mohammad Hossein Bateni · Vahab Mirrokni