Invited talk
in
Workshop: Theoretical Physics for Deep Learning
Is Optimization a sufficient language to understand Deep Learning?
Sanjeev Arora
Speaker: Sanjeev Arora (Princeton/IAS)
Abstract: There is an old debate in neuroscience about whether or not learning has to boil down to optimizing a single cost function. This talk will suggest that even to understand mathematical properties of deep learning, we have to go beyond the conventional view of "optimizing a single cost function". The reason is that phenomena occur along the gradient descent trajectory that are not fully captured in the value of the cost function. I will illustrate briefly with three new results that involve such phenomena:
(i) (joint work with Cohen, Hu, and Luo) How deep matrix factorization solves matrix completion better than classical algorithms https://arxiv.org/abs/1905.13655
(ii) (joint with Du, Hu, Li, Salakhutdinov, and Wang) How to compute (exactly) with an infinitely wide net ("mean field limit", in physics terms) https://arxiv.org/abs/1904.11955
(iii) (joint with Kuditipudi, Wang, Hu, Lee, Zhang, Li, Ge) Explaining mode-connectivity for real-life deep nets (the phenomenon that low-cost solutions found by gradient descent are interconnected in the parameter space via low-cost paths; see Garipov et al'18 and Draxler et al'18)