Timezone: »

Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #828

How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.

Author Information

Sanae Lotfi (New York University)
Sanae Lotfi

I am a PhD student at the Center for Data Science at NYU and a DeepMind fellow, advised by Professor Andrew Gordon Wilson. I am currently interested in designing robust models that can generalize well in and out of distribution. I also work on the closely related question of understanding and quantifying the generalization properties of deep neural networks. More broadly, my research interests include out-of-distribution generalization, Bayesian learning, probabilistic modeling, large-scale optimization, and loss surface analysis. Prior to NYU, I obtained a master’s degree in applied mathematics from Polytechnique Montreal. I was fortunate to work there with Professors Andrea Lodi and Dominique Orban to design stochastic first- and second-order algorithms with compelling theoretical and empirical properties for machine learning and large-scale optimization. I received the Best Master’s Thesis Award in Applied Mathematics at Polytechnique Montreal for this work. I also hold an engineering degree in general engineering and applied mathematics from CentraleSupélec.

Pavel Izmailov (New York University)
Gregory Benton (New York University)
Micah Goldblum (New York University)
Andrew Wilson (New York University)
Andrew Wilson

Andrew Gordon Wilson is faculty in the Courant Institute and Center for Data Science at NYU. His interests include probabilistic modelling, Gaussian processes, Bayesian statistics, physics inspired machine learning, and loss surfaces and generalization in deep learning. His webpage is https://cims.nyu.edu/~andrewgw.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors