Timezone: »

Selfish Sparse RNN Training
Shiwei Liu · Decebal Mocanu · Yulong Pei · Mykola Pechenizkiy

Wed Jul 21 05:30 PM -- 05:35 PM (PDT) @

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained dense network (dense-to-sparse training) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a dense model (sparse-to-sparse training), so that the training process can also be accelerated. However, previous sparse-to-sparse methods mainly focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural Networks (CNNs), failing to match the performance of dense-to-sparse methods in the Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach to train intrinsically sparse RNNs with a fixed parameter count in one single run, without compromising performance. During training, we allow RNN layers to have a non-uniform redistribution across cell gates for better regularization. Further, we propose SNT-ASGD, a novel variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs. Using these strategies, we achieve state-of-the-art sparse training results, better than the dense-to-sparse methods, with various types of RNNs on Penn TreeBank and Wikitext-2 datasets. Our codes are available at https://github.com/Shiweiliuiiiiiii/Selfish-RNN.

Author Information

Shiwei Liu (Eindhoven University of Technology)

Shiwei Liu is a Postdoctoral Fellow at the University of Texas at Austin. He obtained his Ph.D. from the Eindhoven University of Technology in 2022. His research interests cover sparsity in neural networks and efficient ML. He has over 30 publications in top-tier machine learning conferences, such as IJCAI, ICLR, ICML, NeurIPS, IJCV, UAI, and LoG. Shiwei won the best paper award at the LoG’22 conference and the Cum Laude (distinguished Ph.D. thesis) at the Eindhoven University of Technology. He has served as an area chair in ICIP‘22 and ICIP’23; and a PC member of almost all top-tier ML/CV conferences. Shiwei has co-organized two tutorials in IJCAI and ECML-PKDD, which were widely acclaimed by the audience. He has also given more than 20 invited talks at many universities, companies, research labs, and conferences.

Decebal Mocanu (University of Twente)
Yulong Pei (TU Eindhoven)
Mykola Pechenizkiy (TU Eindhoven)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors