Timezone: »
Sparsity and low-rank structures have been incorporated into neural networks to reduce computational complexity and to improve generalization and robustness. Recent theoretical developments show that both are natural characteristics of data-fitting solutions cast in a new family of Banach spaces referred to as RBV2 spaces, the spaces of second-order bounded variation in the Radon domain. Moreover, sparse and deep ReLU networks are solutions to infinite dimensional variational problems in compositions of these spaces. This means that these learning problems can be recast as parametric optimizations over neural network weights. Remarkably, standard weight decay and variants correspond exactly to regularizing the RBV2-norm in the function space. Empirical validation in this paper confirm that weight decay leads to sparse and low-rank networks, as predicted by the theory.
Author Information
Rahul Parhi (University of Wisconsin-Madison)
Jack Wolf (University of Wisconsin-Madison)
Robert Nowak (University of Wisconsion-Madison)

Robert Nowak holds the Nosbusch Professorship in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics.
More from the Same Authors
-
2023 : Algorithm Selection for Deep Active Learning with Imbalanced Datasets »
Jifan Zhang · Shuai Shao · Saurabh Verma · Robert Nowak -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Looped Transformers are Better at Learning Learning Algorithms »
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos -
2023 : SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits »
Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak -
2023 : A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks »
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak -
2023 Oral: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection »
Haoyue Bai · Gregory Canal · Xuefeng Du · Jeongyeol Kwon · Robert Nowak · Sharon Li -
2022 Poster: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Spotlight: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Poster: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2022 Spotlight: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2019 Poster: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Oral: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Tutorial: Active Learning: From Theory to Practice »
Robert Nowak · Steve Hanneke -
2017 Poster: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak -
2017 Talk: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak