Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
Abstract
Recent theoretical analyses of multi-pathway Deep Linear Networks, typically grounded in Gradient Flow, predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent with a large step size reproduces the initial, depth-driven symmetry breaking but ultimately overrides this tendency due to its catapults at the Edge of Stability. In this regime, GD exhibits an implicit preference for low-curvature minima. Since we prove that splitting singular values across pathways minimizes sharpness, large-step GD—driven by its implicit preference for such flat minima—forces a subsequent re-balancing phase: iterates escape sharp, sparse configurations for stable, balanced solutions. Together, these results clarify how architectural depth shapes pathway competition while explaining why GD with large step size ultimately favors shared representations rather than permanent pathway monopolization.