Skip to yearly menu bar Skip to main content


Poster

Variational Learning is Effective for Large Deep Networks

Yuesong Shen · Nico Daheim · Gian Maria Marconi · Peter Nickl · Bai Cong · Bazan Raoul · Rio Yokota · Iryna Gurevych · Daniel Cremers · Khan Emtiyaz · Thomas Moellenhoff


Abstract:

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. Weshow that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperformsAdam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearlyidentical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve fine-tuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence in support of effectiveness of variational learning.

Live content is unavailable. Log in and register to view live content