Timezone: »

Efficient Full-Matrix Adaptive Regularization
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang

Thu Jun 13 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #209

Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in {\em non-convex} optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.

Author Information

Naman Agarwal (Google AI Princeton)
Brian Bullins (Princeton University)
Xinyi Chen (Google Research)
Elad Hazan (Princeton University)
Karan Singh (Princeton University)
Cyril Zhang (Princeton University)
Yi Zhang (Princeton University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors