Skip to yearly menu bar Skip to main content


Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training

Atli Kosson ⋅ Bettina Messmer ⋅ Martin Jaggi

Abstract

Chat is not available.