Skip to yearly menu bar Skip to main content


Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling

Teodora Srećković ⋅ Jonas Geiping ⋅ Antonio Orvieto

Abstract

Chat is not available.