Gradient Flow Dynamics and Implicit Bias of Diagonal Linear Networks under Infinitesimal Initialization
Abstract
We study the gradient flow dynamics of diagonal linear networks for regression tasks under infinitesimal initialization. Extending the saddle-to-saddle dynamics described in Theorem 1 from Pesme & Flammarion (2023), we generalize the analysis to both deep diagonal linear networks and a broader class of two-layer diagonal linear networks (as defined in Definition 4.1). Specifically, we demonstrate that the training trajectories of these models can be equivalently characterized by the proposed Algorithm 1. We further prove that this algorithm converges to the solution of a modified ℓ1 norm minimization problem. As a result, we establish that the implicit bias of both network architectures corresponds to a modified ℓ1 norm in the regime of infinitesimal initialization. Additionally, we provide insights into the underlying mechanisms governing these dynamics by identifying the Structural Invariant Manifold (SIM) (Zhao et al., 2025) as the key geometric structure that shapes the learning process