Skip to yearly menu bar Skip to main content


Poster
in
Workshop: PAC-Bayes Meets Interactive Learning

Flat minima can fail to transfer to downstream tasks

Deepansha Singh · Ekansh Sharma · Daniel Roy · Gintare Karolina Dziugaite


Abstract:

Large neural networks trained on one task are often finetuned and reused on different but related downstream tasks. The prospect of general principals that might lead to improved transferability is very enticing, as pretraining is exceptionally resource intensive.%and target tasks with small amounts of data In recent work, Liu et al. (2022) propose to use flatness as a metric to judge the transferability of pretrained neural networks, based on the observation that, on a suite of benchmarks, flatter minima led to better transfer. Is this a general principal?In this extended abstract, we show that flatness is not a reliable indicator of transferability, despite flatness having been linked to generalization via PAC-Bayes and empirical analysis.We demonstrate that the question of whether flatness helps or hurts depends on the relationship between the source and target tasks.

Chat is not available.