Beyond first order methods in machine learning systems

Workshop

Beyond first order methods in machine learning systems

Albert S Berahas · Amir Gholaminejad · Anastasios Kyrillidis · Michael Mahoney · Fred Roosta

Fri 17 Jul, 8 a.m. PDT

Keywords: Optimization Second Order Methods Stochastic Gradient Descent

[ Abstract ] Workshop Website

Optimization lies at the heart of many exciting developments in machine learning, statistics and signal processing. As models become more complex and datasets get larger, finding efficient, reliable and provable methods is one of the primary goals in these fields.

In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications. First-order methods, however, have significant limitations: (1) they require fine hyper-parameter tuning, (2) they do not incorporate curvature information, and thus are sensitive to ill-conditioning, and (3) they are often unable to fully exploit the power of distributed computing architectures.

Higher-order methods, such as Newton, quasi-Newton and adaptive gradient descent methods, are extensively used in many scientific and engineering domains. At least in theory, these methods possess several nice features: they exploit local curvature information to mitigate the effects of ill-conditioning, they avoid or diminish the need for hyper-parameter tuning, and they have enough concurrency to take advantage of distributed computing environments. Researchers have even developed stochastic versions of higher-order methods, that feature speed and scalability by incorporating curvature information in an economical and judicious manner. However, often higher-order methods are “undervalued.”

This workshop will attempt to shed light on this statement. Topics of interest include, but are not limited to, second-order methods, adaptive gradient descent methods, regularization techniques, as well as techniques based on higher-order derivatives.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 8:00 a.m. - 8:15 a.m.	Introductory notes ( Talk ) >	🔗
Fri 8:15 a.m. - 9:00 a.m.	Talk by Peter Richtarik - Fast linear convergence of randomized BFGS ( Talk ) >	Peter Richtarik 🔗
Fri 9:00 a.m. - 9:10 a.m.	Q&A with Peter Richtarik ( Discussion ) >	Peter Richtarik 🔗
Fri 9:10 a.m. - 9:55 a.m.	Talk by Francis Bach - Second Order Strikes Back - Globally convergent Newton methods for ill-conditioned generalized self-concordant Losses ( Talk ) > SlidesLive Video	Francis Bach 🔗
Fri 9:55 a.m. - 10:05 a.m.	Q&A with Francis Bach ( Discussion ) >	Francis Bach 🔗
Fri 10:05 a.m. - 10:30 a.m.	Break until 10:30am (PDT)	🔗
Fri 10:30 a.m. - 10:40 a.m.	Spotlight talk 1 - A Second-Order Optimization Algorithm for Solving Problems Involving Group Sparse Regularization ( Talk ) >	Daniel Robinson 🔗
Fri 10:40 a.m. - 10:50 a.m.	Spotlight talk 2 - Ridge Riding: Finding diverse solutions by following eigenvectors of the Hessian ( Talk ) >	Jack Parker-Holder 🔗
Fri 10:50 a.m. - 11:00 a.m.	Spotlight talk 3 - PyHessian: Neural Networks Through the Lens of the Hessian ( Talk ) > SlidesLive Video	Amir Gholaminejad 🔗
Fri 11:00 a.m. - 11:45 a.m.	Talk by Coralia Cartis - Dimensionality reduction techniques for large-scale optimization problems ( Talk ) >	Coralia Cartis 🔗
Fri 11:45 a.m. - 11:55 a.m.	Q&A with Coralia Cartis ( Discussion ) >	Coralia Cartis 🔗
Fri 11:55 a.m. - 1:30 p.m.	Break until 13:30pm (PDT)	🔗
Fri 1:30 p.m. - 1:40 p.m.	Spotlight talk 4 - MomentumRNN: Integrating Momentum into Recurrent Neural Networks ( Talk ) >	Minh Nguyen 🔗
Fri 1:40 p.m. - 1:50 p.m.	Spotlight talk 5 - Step-size Adaptation Using Exponentiated Gradient Updates ( Talk ) > SlidesLive Video	Ehsan Amid 🔗
Fri 1:50 p.m. - 2:00 p.m.	Spotlight talk 6 - Competitive Mirror Descent ( Talk ) > SlidesLive Video	Florian Schäfer 🔗
Fri 2:00 p.m. - 2:15 p.m.	Industry Panel - Talk by Boris Ginsburg - Large scale deep learning: new trends and optimization challenges ( Talk ) > SlidesLive Video	Boris Ginsburg 🔗
Fri 2:15 p.m. - 2:30 p.m.	Industry Panel - Talk by Jonathan Hseu - ML Models in Production ( Talk ) > SlidesLive Video	Jonathan Hseu 🔗
Fri 2:30 p.m. - 2:45 p.m.	Industry Panel - Talk by Andres Rodriguez - Shifting the DL industry to 2nd order methods ( Talk ) > SlidesLive Video	Andres Rodriguez 🔗
Fri 2:45 p.m. - 3:00 p.m.	Industry Panel - Talk by Lin Xiao - Statistical Adaptive Stochastic Gradient Methods ( Talk ) > SlidesLive Video	Lin Xiao 🔗
Fri 3:00 p.m. - 3:30 p.m.	Industry panel Q&A ( Discussion ) >	🔗
Fri 3:30 p.m. - 4:15 p.m.	Talk by Rachel Ward - Weighted Optimization: better generalization by smoother interpolation ( Talk ) >	Rachel Ward 🔗
Fri 4:15 p.m. - 4:25 p.m.	Q&A with Rachel Ward ( Discussion ) >	Rachel Ward 🔗
Fri 4:25 p.m. - 5:00 p.m.	Break until 17:00pm (PDT)	🔗
Fri 5:00 p.m. - 5:45 p.m.	Talk by Rio Yokota - Degree of Approximation and Overhead of Computing Curvature, Information, and Noise Matrices ( Talk ) > SlidesLive Video	Rio Yokota 🔗
Fri 5:45 p.m. - 5:55 p.m.	Q&A with Rio Yokota ( Discussion ) >	Rio Yokota 🔗
Fri 5:55 p.m. - 6:00 p.m.	Closing remarks ( Discussion ) >	🔗