Skip to yearly menu bar Skip to main content


Poster Wed, Jul 16, 2025 • 11:00 AM – 1:30 PM PDT

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Jinbo Wang · Mingze Wang · Zhanpeng Zhou · Junchi Yan · Weinan E · Lei Wu

Abstract

Lay Summary

Video

Chat is not available.