Poster
in
Workshop: Duality Principles for Modern Machine Learning
A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak
Keywords: [ deep neural networks ] [ Regularization ] [ multi-task lasso ] [ weight decay ] [ representer theorem ] [ Banach duality ]
This paper characterizes the kinds of functions learned by multi-output (vector-valued) ReLU neural networks trained with weight decay.This extends previous results that were limited to single-output networks, which is crucial to understanding the effects of weight decay on deep neural networks (DNNs). The new characterization requires the definition of a new class of neural function spaces that we call vector-valued variation (VV) spaces. By exploiting the (Banach) duality between the space ofvector-valued measures and the space of vector-valued continuous functions, we prove that neural networks (NNs) are optimal solutions to learning problems posed over VV spaces via a novel representer theorem. Our representer theorem shows that solutions to these learning problems exist as vector-valued NNs with widths bounded in terms of the number of training samples. Next, via a novel connection to the multi-task lasso problem, we derive data-dependent bounds on the widths of homogeneous layers in DNNs. The bounds are determined by the effective dimensions of the training data embeddings in/out of the layers. These results shed new light on the regularity of DNN functions trained with weight decay as well as the kinds of architectures weight decay induces.