Abstract

Predictive Automatic Relevance Determination by Expectation Propagation
Yuan Qi - MIT Thomas Minka - Microsoft Research Rosalind Picard - MIT Zoubin Ghahramani - University Colledge London
In many real-world classification problems the input contains a large numberof potentially irrelevant features. This paper proposes a new Bayesian framework for determining the relevance of input features. Thisapproach extends one of the most successful Bayesian methods for featureselection and sparse learning, known as Automatic Relevance Determination(ARD). ARD finds the relevance of features by optimizing the model marginallikelihood, also known as the evidence. We show that this can lead tooverfitting. To address this problem, we propose Predictive ARD based onestimating the predictive performance of the classifier. While the actualleave-one-out predictive performance is generally very costly to compute, theexpectation propagation (EP) algorithm proposed by Minka provides an estimateof this predictive performance as a side-effect of its iterations. We exploitthis in our algorithm to do feature selection, and to select data points in asparse Bayesian kernel classifier. Moreover, we provide two other improvementsto previous algorithms, by replacing Laplace's approximation with thegenerally more accurate EP, and by incorporating the fast optimizationalgorithm proposed by Faul and Tipping. Our experiments show that our methodbased on the EP estimate of predictive performance is more accurate on testdata than relevance determination by optimizing the evidence.

Predictive Automatic Relevance Determination by Expectation Propagation

Yuan Qi - MIT
Thomas Minka - Microsoft Research
Rosalind Picard - MIT
Zoubin Ghahramani - University Colledge London

In many real-world classification problems the input contains a large numberof potentially irrelevant features. This paper proposes a new Bayesian framework for determining the relevance of input features. Thisapproach extends one of the most successful Bayesian methods for featureselection and sparse learning, known as Automatic Relevance Determination(ARD). ARD finds the relevance of features by optimizing the model marginallikelihood, also known as the evidence. We show that this can lead tooverfitting. To address this problem, we propose Predictive ARD based onestimating the predictive performance of the classifier. While the actualleave-one-out predictive performance is generally very costly to compute, theexpectation propagation (EP) algorithm proposed by Minka provides an estimateof this predictive performance as a side-effect of its iterations. We exploitthis in our algorithm to do feature selection, and to select data points in asparse Bayesian kernel classifier. Moreover, we provide two other improvementsto previous algorithms, by replacing Laplace's approximation with thegenerally more accurate EP, and by incorporating the fast optimizationalgorithm proposed by Faul and Tipping. Our experiments show that our methodbased on the EP estimate of predictive performance is more accurate on testdata than relevance determination by optimizing the evidence.