Timezone: »

A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak

This paper characterizes the kinds of functions learned by multi-output (vector-valued) ReLU neural networks trained with weight decay.This extends previous results that were limited to single-output networks, which is crucial to understanding the effects of weight decay on deep neural networks (DNNs). The new characterization requires the definition of a new class of neural function spaces that we call vector-valued variation (VV) spaces. By exploiting the (Banach) duality between the space ofvector-valued measures and the space of vector-valued continuous functions, we prove that neural networks (NNs) are optimal solutions to learning problems posed over VV spaces via a novel representer theorem. Our representer theorem shows that solutions to these learning problems exist as vector-valued NNs with widths bounded in terms of the number of training samples. Next, via a novel connection to the multi-task lasso problem, we derive data-dependent bounds on the widths of homogeneous layers in DNNs. The bounds are determined by the effective dimensions of the training data embeddings in/out of the layers. These results shed new light on the regularity of DNN functions trained with weight decay as well as the kinds of architectures weight decay induces.

Author Information

Joseph Shenouda (University of Wisconsin Madison)
Rahul Parhi (EPFL - EPF Lausanne)
Kangwook Lee (KAIST)
Robert Nowak (University of Wisconsion-Madison)
Robert Nowak

Robert Nowak holds the Nosbusch Professorship in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics.

More from the Same Authors