Talk
in
Workshop: Beyond first order methods in machine learning systems
Spotlight talk 2 - Ridge Riding: Finding diverse solutions by following eigenvectors of the Hessian
Jack Parker-Holder
Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring are a key component of the success of deep neural networks. However, in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired types of solutions which SGD on `standard' loss functions will not find. In this paper, we present a different approach. Rather than following the a locally greedy gradient, we instead follow the eigenvectors of the Hessian. We call these 'ridges'. By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Riding, offers a promising direction.