Skip to yearly menu bar Skip to main content


Tutorial

Recent Advances in Stochastic Convex and Non-Convex Optimization

Zeyuan Allen-Zhu

Cockle Bay

Abstract:

In this tutorial, we will provide an accessible and extensive overview on recent advances to optimization methods based on stochastic gradient descent (SGD), for both convex and non-convex tasks. In particular, this tutorial shall try to answer the following questions with theoretical support. How can we properly use momentum to speed up SGD? What is the maximum parallel speedup can we achieve for SGD? When should we use dual or primal-dual approach to replace SGD? What is the difference between coordinate descent (e.g. SDCA) and SGD? How is variance reduction affecting the performance of SGD? Why does the second-order information help us improve the convergence of SGD?

Live content is unavailable. Log in and register to view live content