Timezone: »

Should You Follow the Gradient Flow? Insights from Runge-Kutta Gradient Descent
Xiang Li · Antonio Orvieto

Recently, it has become popular in the machine learning community to model gradient-based optimization algorithms as ordinary differential equations (ODEs). Moreover, state-of-the-art optimizers such as SGD and Momentum can be recovered from the corresponding ODE using first-order numerical integrators such as explicit and symplectic Euler methods. In contrast, very little theoretical and experimental investigation has been carried out on the properties of higher-order integrators in optimization. In this paper, we analyze the properties of high-order Runge-Kutta (RK) integrators on gradient flows, in the context of both convex optimization and deep learning. We show that, while RK provides a close approximation to the gradient flow, this induces an increase in sharpness (maximum Hessian eigenvalue) at the solution – a feature which is believed to be negatively correlated with generalization. In addition, we show that, while high-order RK descent methods are stable for a broad range of stepsizes, convergence speed (in terms of training loss) is usually negatively affected by the method order.

Author Information

Xiang Li (ETH Zurich)
Antonio Orvieto (ETH Zurich)

More from the Same Authors