Timezone: »

 
Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio
Julien N Siems · Aaron Klein · Cedric Archambeau · Maren Mahsereci

While training highly overparameterized neural networks is common practice in deep learning, research into post-hoc weight-pruning suggests that more than 90% of parameters can be removed without loss in predictive performance. To save resources, zero-shot and one-shot pruning attempt to find such a sparse representation at initialization or at an early stage of training. Though efficient, there is no justification, why the sparsity structure should not change during training. Dynamic sparsity pruning undoes this limitation and allows to adapt the structure of the sparse neural network during training. Recent approaches rely on weight magnitude pruning, which has been shown to be sub-optimal when applied at earlier training stages. In this work we propose to use the gradient noise to make pruning decisions. The procedure enables us to automatically adjust the sparsity during training without imposing a hand-designed sparsity schedule, while at the same time being able to recover from previous pruning decisions by unpruning connections as necessary. We evaluate our new method on image and tabular datasets and demonstrate that we reach similar performance as the dense model from which extract the sparse network, while exposing less hyperparameters than other dynamic sparsity methods.

Author Information

Julien N Siems (Universität Freiburg)
Aaron Klein (AWS Berlin)
Cedric Archambeau (Amazon)
Maren Mahsereci (Amazon)

More from the Same Authors

  • 2021 : A resource-efficient method for repeated HPO and NAS problems »
    Giovanni Zappella · David Salinas · Cedric Archambeau
  • 2021 Poster: BORE: Bayesian Optimization by Density-Ratio Estimation »
    Louis Chi-Chun Tiao · Aaron Klein · Matthias W Seeger · Edwin V Bonilla · Cedric Archambeau · Fabio Ramos
  • 2021 Oral: BORE: Bayesian Optimization by Density-Ratio Estimation »
    Louis Chi-Chun Tiao · Aaron Klein · Matthias W Seeger · Edwin V Bonilla · Cedric Archambeau · Fabio Ramos
  • 2019 : Poster Session 1 (all papers) »
    Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · wenwu zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel
  • 2019 Poster: NAS-Bench-101: Towards Reproducible Neural Architecture Search »
    Chris Ying · Aaron Klein · Eric Christiansen · Esteban Real · Kevin Murphy · Frank Hutter
  • 2019 Oral: NAS-Bench-101: Towards Reproducible Neural Architecture Search »
    Chris Ying · Aaron Klein · Eric Christiansen · Esteban Real · Kevin Murphy · Frank Hutter
  • 2018 Poster: BOHB: Robust and Efficient Hyperparameter Optimization at Scale »
    Stefan Falkner · Aaron Klein · Frank Hutter
  • 2018 Oral: BOHB: Robust and Efficient Hyperparameter Optimization at Scale »
    Stefan Falkner · Aaron Klein · Frank Hutter