Timezone: »
This paper characterizes the kinds of functions learned by multi-output (vector-valued) ReLU neural networks trained with weight decay.This extends previous results that were limited to single-output networks, which is crucial to understanding the effects of weight decay on deep neural networks (DNNs). The new characterization requires the definition of a new class of neural function spaces that we call vector-valued variation (VV) spaces. By exploiting the (Banach) duality between the space ofvector-valued measures and the space of vector-valued continuous functions, we prove that neural networks (NNs) are optimal solutions to learning problems posed over VV spaces via a novel representer theorem. Our representer theorem shows that solutions to these learning problems exist as vector-valued NNs with widths bounded in terms of the number of training samples. Next, via a novel connection to the multi-task lasso problem, we derive data-dependent bounds on the widths of homogeneous layers in DNNs. The bounds are determined by the effective dimensions of the training data embeddings in/out of the layers. These results shed new light on the regularity of DNN functions trained with weight decay as well as the kinds of architectures weight decay induces.
Author Information
Joseph Shenouda (University of Wisconsin Madison)
Rahul Parhi (EPFL - EPF Lausanne)
Kangwook Lee (KAIST)
Robert Nowak (University of Wisconsion-Madison)

Robert Nowak holds the Nosbusch Professorship in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics.
More from the Same Authors
-
2021 : On the Sparsity of Deep Neural Networks in the Overparameterized Regime: An Empirical Study »
Rahul Parhi · Jack Wolf · Robert Nowak -
2023 : Algorithm Selection for Deep Active Learning with Imbalanced Datasets »
Jifan Zhang · Shuai Shao · Saurabh Verma · Robert Nowak -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding »
Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee -
2023 : Looped Transformers are Better at Learning Learning Algorithms »
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos -
2023 : SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits »
Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak -
2023 Oral: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Looped Transformers as Programmable Computers »
Angeliki Giannou · Shashank Rajput · Jy-yong Sohn · Kangwook Lee · Jason Lee · Dimitris Papailiopoulos -
2023 Poster: Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection »
Haoyue Bai · Gregory Canal · Xuefeng Du · Jeongyeol Kwon · Robert Nowak · Sharon Li -
2023 Poster: Improving Fair Training under Correlation Shifts »
Yuji Roh · Kangwook Lee · Steven Whang · Changho Suh -
2023 Poster: Optimizing DDPM Sampling with Shortcut Fine-Tuning »
Ying Fan · Kangwook Lee -
2022 Poster: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Spotlight: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Poster: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2022 Spotlight: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2019 Poster: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Oral: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Tutorial: Active Learning: From Theory to Practice »
Robert Nowak · Steve Hanneke -
2017 Poster: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak -
2017 Talk: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak