Skip to yearly menu bar Skip to main content


Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training

Atli Kosson · Bettina Messmer · Martin Jaggi

Abstract

Chat is not available.