Timezone: »
Despite significant recent advances in deep neural networks, training them remains a challenge due to highly non-convex nature of the objective function. State-of-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-update across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems, avoiding gradient chains and thus the vanishing gradient issue, allowing weight update parallelization, among other advantages. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets or unlimited data streams in online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) algorithm for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings, and extensive empirical evaluation on various architectures and datasets, demonstrating advantages of the proposed approach as compared to both offline auxiliary variable methods and to the backpropagation-based stochastic gradient descent.
Author Information
Anna Choromanska (New York University)
Benjamin Cowen (NYU)
Sadhana Kumaravel (IBM Research)
Ronny Luss (IBM Research)
Mattia Rigotti (IBM Research AI)
Irina Rish (IBM Research AI)
Paolo DiAchille (IBM Research)
Viatcheslav Gurev (IBM Research)
Brian Kingsbury (IBM Research)
Ravi Tejwani (MIT)
Djallel Bouneffouf (IBM Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Beyond Backprop: Online Alternating Minimization with Auxiliary Variables »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #57
More from the Same Authors
-
2020 Poster: Enhancing Simple Models by Exploiting What They Already Know »
Amit Dhurandhar · Karthikeyan Shanmugam · Ronny Luss -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 Poster: Estimating Information Flow in Deep Neural Networks »
Ziv Goldfeld · Ewout van den Berg · Kristjan Greenewald · Igor Melnyk · Nam Nguyen · Brian Kingsbury · Yury Polyanskiy -
2019 Oral: Estimating Information Flow in Deep Neural Networks »
Ziv Goldfeld · Ewout van den Berg · Kristjan Greenewald · Igor Melnyk · Nam Nguyen · Brian Kingsbury · Yury Polyanskiy -
2017 Poster: Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation »
Yacine Jernite · Anna Choromanska · David Sontag -
2017 Talk: Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation »
Yacine Jernite · Anna Choromanska · David Sontag