Skip to yearly menu bar Skip to main content


Poster

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Jinbo Wang · Mingze Wang · Zhanpeng Zhou · Junchi Yan · Weinan E · Lei Wu
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.