Beyond first-order methods in machine learning systems

Workshop

Beyond first-order methods in machine learning systems

Albert S Berahas · Anastasios Kyrillidis · Fred Roosta · Amir Gholaminejad · Michael Mahoney · Rachael Tappenden · Raghu Bollapragada · Rixon Crane · J. Lyle Kim

Sat 24 Jul, 7 a.m. PDT

[ Abstract ] Workshop Website

Optimization lies at the heart of many exciting developments in machine learning, statistics and signal processing. As models become more complex and datasets get larger, finding efficient, reliable and provable methods is one of the primary goals in these fields.

In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications. First-order methods, however, have significant limitations: (1) they require fine hyper-parameter tuning, (2) they do not incorporate curvature information, and thus are sensitive to ill-conditioning, and (3) they are often unable to fully exploit the power of distributed computing architectures.

Higher-order methods, such as Newton, quasi-Newton and adaptive gradient descent methods, are extensively used in many scientific and engineering domains. At least in theory, these methods possess several nice features: they exploit local curvature information to mitigate the effects of ill-conditioning, they avoid or diminish the need for hyper-parameter tuning, and they have enough concurrency to take advantage of distributed computing environments. Researchers have even developed stochastic versions of higher-order methods, that feature speed and scalability by incorporating curvature information in an economical and judicious manner. However, often higher-order methods are “undervalued.”

This workshop will attempt to shed light on this statement. Topics of interest include, but are not limited to, second-order methods, adaptive gradient descent methods, regularization techniques, as well as techniques based on higher-order derivatives.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 7:00 a.m. - 7:10 a.m.	Introductory remarks ( Introductory remarks ) > SlidesLive Video	Raghu Bollapragada 🔗
Sat 7:10 a.m. - 7:55 a.m.	Recent trends in regularization methods with adaptive accuracy requirements ( Plenary Talk ) > SlidesLive Video	Stefania Bellavia 🔗
Sat 7:55 a.m. - 8:05 a.m.	Q&A with Stefania Bellavia (Plenary Speaker #1) ( Q&A ) >	🔗
Sat 8:05 a.m. - 8:50 a.m.	Conjugate gradient techniques for nonconvex optimization ( Plenary Talk ) > SlidesLive Video	Clément Royer 🔗
Sat 8:50 a.m. - 9:00 a.m.	Q&A with Clément Royer (Plenary Speaker #2) ( Q&A ) >	🔗
Sat 9:00 a.m. - 9:20 a.m.	Break #1	🔗
Sat 9:20 a.m. - 10:05 a.m.	Algorithms for Deterministically Constrained Stochastic Optimization ( Plenary Talk ) > SlidesLive Video	Frank E Curtis 🔗
Sat 10:05 a.m. - 10:15 a.m.	Q&A with Frank E. Curtis (Plenary Speaker #3) ( Q&A ) >	🔗
Sat 10:15 a.m. - 11:00 a.m.	SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality ( Plenary Talk ) > SlidesLive Video	Courtney Paquette 🔗
Sat 11:00 a.m. - 11:10 a.m.	Q&A with Courtney Paquette (Plenary Speaker #4) ( Q&A ) >	🔗
Sat 11:10 a.m. - 11:20 a.m.	Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks ( Spotlight Talk ) > SlidesLive Video	Manuela Girotti 🔗
Sat 11:20 a.m. - 11:30 a.m.	Computing the Newton-step faster than Hessian accumulation ( Spotlight Talk ) > SlidesLive Video	Akshay Srinivasan 🔗
Sat 11:30 a.m. - 1:00 p.m.	Break #2	🔗
Sat 1:00 p.m. - 1:45 p.m.	Descent method framework in optimization ( Plenary Talk ) > SlidesLive Video	Ashia Wilson 🔗
Sat 1:45 p.m. - 1:55 p.m.	Q&A with Ashia Wilson (Plenary Speaker #5) ( Q&A ) >	🔗
Sat 1:55 p.m. - 2:40 p.m.	Faster Empirical Risk Minimization ( Plenary Talk ) > SlidesLive Video	Jelena Diakonikolas 🔗
Sat 2:40 p.m. - 2:50 p.m.	Q&A with Jelena Diakonikolas (Plenary Speaker #6) ( Q&A ) >	🔗
Sat 2:50 p.m. - 3:15 p.m.	Break #3	🔗
Sat 3:15 p.m. - 3:25 p.m.	Regularized Newton Method with Global O(1/k^2) Convergence ( Spotlight Talk ) > SlidesLive Video	Konstantin Mishchenko 🔗
Sat 3:25 p.m. - 3:35 p.m.	Structured second-order methods via natural-gradient descent ( Spotlight Talk ) > SlidesLive Video	Wu Lin 🔗
Sat 3:35 p.m. - 3:45 p.m.	Implicit Regularization in Overparameterized Bilevel Optimization ( Spotlight Talk ) > SlidesLive Video	Paul Vicol 🔗
Sat 3:45 p.m. - 4:30 p.m.	Stochastic Variance-Reduced High-order Optimization for Nonconvex Optimization ( Plenary Talk ) > SlidesLive Video	Quanquan Gu 🔗
Sat 4:30 p.m. - 4:40 p.m.	Q&A with Quanquan Gu (Plenary Speaker #7) ( Q&A ) >	🔗
Sat 4:40 p.m. - 4:50 p.m.	Closing remarks ( Closing remarks ) > SlidesLive Video	Albert S Berahas 🔗