It is widely conjectured that training algorithms for neural networks are successful because all local minima lead to similar performance; for example, see (LeCun et al., 2015; Choromanska et al., 2015; Dauphin et al., 2014). Performance is typically measured in terms of two metrics: training performance and generalization performance. Here we focus on the training performance of neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of appropriately chosen surrogate loss functions. Our conditions are roughly in the following form: the neurons have to be increasing and strictly convex, the neural network should either be single-layered or is multi-layered with a shortcut-like connection, and the surrogate loss function should be a smooth version of hinge loss. We also provide counterexamples to show that, when these conditions are relaxed, the result may not hold.
SHIYU LIANG (UIUC)
Ruoyu Sun (University of Illinois at Urbana-Champaign)
Yixuan Li (Facebook Inc)
I joined the Facebook Research computer vision group in October 2017 as a Research Scientist. I work on large-scale and distributed deep learning for various computer vision applications. I am interested in extreme classification when both dataset and label sizes scale to large extent. Prior to Facebook, I received my PhD in 2017 from Cornell University, where my research covered topics including deep learning interpretability, optimizing neural networks with computational efficiency, adversarial training of deep generative models, neural network reliability issues etc. I spent two summers working at Google (Research), CA in 2015 and 2016. Before that, I obtained my BE from Shanghai Jiaotong University, China, in 2013.
R Srikant (UIUC)
Related Events (a corresponding poster, oral, or spotlight)
2018 Oral: Understanding the Loss Surface of Neural Networks for Binary Classification »
Thu Jul 12th 12:20 -- 12:30 PM Room K1