Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop
On the Advantage of Lion Compared to signSGD with Momentum
Alessandro Noiato · Luca Biggio · Antonio Orvieto
Abstract:
This paper explores the relationship between the recently introduced Lion Optimizer and the similar signSGD with momentum --- focusing on their different gradient averaging mechanisms prior to the sign operator. A precise formula for the averaging mechanisms of Lion and Signum is derived, and a comparison is made in the deterministic convex setting. Empirical investigations highlight the advantage of Lion, a finding further supported by a convergence guarantee in the stochastic setting. Our work suggests that Lion has the ability to effectively balance fast and slow averaging, leading to stable and rapid convergence.
Chat is not available.