Poster
in
Workshop: PAC-Bayes Meets Interactive Learning
Flat minima can fail to transfer to downstream tasks
Deepansha Singh · Ekansh Sharma · Daniel Roy · Gintare Karolina Dziugaite
Large neural networks trained on one task are often finetuned and reused on different but related downstream tasks. The prospect of general principals that might lead to improved transferability is very enticing, as pretraining is exceptionally resource intensive.%and target tasks with small amounts of data In recent work, Liu et al. (2022) propose to use flatness as a metric to judge the transferability of pretrained neural networks, based on the observation that, on a suite of benchmarks, flatter minima led to better transfer. Is this a general principal?In this extended abstract, we show that flatness is not a reliable indicator of transferability, despite flatness having been linked to generalization via PAC-Bayes and empirical analysis.We demonstrate that the question of whether flatness helps or hurts depends on the relationship between the source and target tasks.