Timezone: »
Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it potentially makes the optimization process easier. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on MNIST and Penn Tree Bank data sets.
Author Information
Jiong Zhang (University of Texas at Austin)
Qi Lei (University of Texas at Austin)
Inderjit Dhillon (UT Austin & Amazon)
Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit has received several awards, including the ICES Distinguished Research Award, the SIAM Outstanding Paper Prize, the Moncrief Grand Challenge Award, the SIAM Linear Algebra Prize, the University Research Excellence Award, and the NSF Career Award. He has published over 160 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Inderjit is an ACM Fellow, an IEEE Fellow, a SIAM Fellow and an AAAS Fellow.
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization »
Wed. Jul 11th 03:40 -- 03:50 PM Room Victoria
More from the Same Authors
-
2022 : Positive Unlabeled Contrastive Representation Learning »
Anish Acharya · Sujay Sanghavi · Li Jing · Bhargav Bhushanam · Michael Rabbat · Dhruv Choudhary · Inderjit Dhillon -
2023 : UCB Provably Learns From Inconsistent Human Feedback »
Shuo Yang · Tongzheng Ren · Inderjit Dhillon · Sujay Sanghavi -
2023 Poster: Representer Point Selection for Explaining Regularized High-dimensional Models »
Che-Ping Tsai · Jiong Zhang · Hsiang-Fu Yu · Eli Chien · Cho-Jui Hsieh · Pradeep Ravikumar -
2023 Poster: PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation »
Eli Chien · Jiong Zhang · Cho-Jui Hsieh · Jyun-Yu Jiang · Wei-Cheng Chang · Olgica Milenkovic · Hsiang-Fu Yu -
2022 Poster: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2022 Spotlight: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2021 Poster: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Spotlight: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2020 : Discussion Panel »
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford -
2020 : Invited Talk 5 Q&A - Inderjit Dhillon »
Inderjit Dhillon -
2020 : Invited Talk 5 - Multi-Output Prediction: Theory and Practice - Inderjit Dhillon »
Inderjit Dhillon -
2020 Poster: SGD Learns One-Layer Networks in WGANs »
Qi Lei · Jason Lee · Alexandros Dimakis · Constantinos Daskalakis -
2020 Poster: Learning to Encode Position for Transformer with Continuous Dynamical Model »
Xuanqing Liu · Hsiang-Fu Yu · Inderjit Dhillon · Cho-Jui Hsieh -
2020 Poster: Extreme Multi-label Classification from Aggregated Labels »
Yanyao Shen · Hsiang-Fu Yu · Sujay Sanghavi · Inderjit Dhillon -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2018 Poster: Learning long term dependencies via Fourier recurrent units »
Jiong Zhang · Yibo Lin · Zhao Song · Inderjit Dhillon -
2018 Poster: Towards Fast Computation of Certified Robustness for ReLU Networks »
Tsui-Wei Weng · Huan Zhang · Hongge Chen · Zhao Song · Cho-Jui Hsieh · Luca Daniel · Duane Boning · Inderjit Dhillon -
2018 Oral: Towards Fast Computation of Certified Robustness for ReLU Networks »
Tsui-Wei Weng · Huan Zhang · Hongge Chen · Zhao Song · Cho-Jui Hsieh · Luca Daniel · Duane Boning · Inderjit Dhillon -
2018 Oral: Learning long term dependencies via Fourier recurrent units »
Jiong Zhang · Yibo Lin · Zhao Song · Inderjit Dhillon -
2017 Poster: Gradient Coding: Avoiding Stragglers in Distributed Learning »
Rashish Tandon · Qi Lei · Alexandros Dimakis · Nikos Karampatziakis -
2017 Poster: Gradient Boosted Decision Trees for High Dimensional Sparse Output »
Si Si · Huan Zhang · Sathiya Keerthi · Dhruv Mahajan · Inderjit Dhillon · Cho-Jui Hsieh -
2017 Talk: Gradient Coding: Avoiding Stragglers in Distributed Learning »
Rashish Tandon · Qi Lei · Alexandros Dimakis · Nikos Karampatziakis -
2017 Talk: Gradient Boosted Decision Trees for High Dimensional Sparse Output »
Si Si · Huan Zhang · Sathiya Keerthi · Dhruv Mahajan · Inderjit Dhillon · Cho-Jui Hsieh -
2017 Poster: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon -
2017 Poster: Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization »
Qi Lei · En-Hsu Yen · Chao-Yuan Wu · Inderjit Dhillon · Pradeep Ravikumar -
2017 Talk: Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization »
Qi Lei · En-Hsu Yen · Chao-Yuan Wu · Inderjit Dhillon · Pradeep Ravikumar -
2017 Talk: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon