Skip to yearly menu bar Skip to main content


Poster

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Jinbo Wang ⋅ Mingze Wang ⋅ Zhanpeng Zhou ⋅ Junchi Yan ⋅ Weinan E ⋅ Lei Wu
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.