Timezone: »

Escaping Saddles with Stochastic Gradients
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann

Wed Jul 11 05:50 AM -- 06:10 AM (PDT) @ A9

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

Author Information

Hadi Daneshmand (ETH Zurich)
Jonas Kohler (ETH Zurich)
Aurelien Lucchi (ETH Zurich)
Thomas Hofmann (ETH Zurich)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors