Curvature-corrected learning dynamics in deep neural networks

Dongsung Huh

Keywords: [ Learning Theory ] [ Non-convex Optimization ] [ Supervised Learning ] [ Other ]

[ Abstract ] [ Join Zoom
Please do not share or post zoom links


Deep neural networks exhibit complex learning dynamics due to the non-convexity of loss landscapes. Second-order optimization methods facilitate learning dynamics by compensating for ill-conditioned curvature. We provide analytical description of how curvature-correction changes the learning dynamics in deep linear neural networks. It reveals that curvature-correction preserves the path of parameter dynamics, and thus only modifies the temporal profile along the path. This accelerates the convergence dynamics by reducing the nonlinear effect of depth on the learning dynamics of the input-output map. Our analysis also reveals an undesirable effect of curvature correction that compromises stability of parameters dynamics during learning, especially with block-diagonal approximation of natural gradient. We introduce fractional curvature-correction, which resolves the vanishing/exploding update problem while exhibiting most of the acceleration benefit of full curvature correction.

Chat is not available.