Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Duality Principles for Modern Machine Learning

A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks

Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak

Keywords: [ deep neural networks ] [ Regularization ] [ multi-task lasso ] [ weight decay ] [ representer theorem ] [ Banach duality ]


Abstract:

This paper characterizes the kinds of functions learned by multi-output (vector-valued) ReLU neural networks trained with weight decay.This extends previous results that were limited to single-output networks, which is crucial to understanding the effects of weight decay on deep neural networks (DNNs). The new characterization requires the definition of a new class of neural function spaces that we call vector-valued variation (VV) spaces. By exploiting the (Banach) duality between the space ofvector-valued measures and the space of vector-valued continuous functions, we prove that neural networks (NNs) are optimal solutions to learning problems posed over VV spaces via a novel representer theorem. Our representer theorem shows that solutions to these learning problems exist as vector-valued NNs with widths bounded in terms of the number of training samples. Next, via a novel connection to the multi-task lasso problem, we derive data-dependent bounds on the widths of homogeneous layers in DNNs. The bounds are determined by the effective dimensions of the training data embeddings in/out of the layers. These results shed new light on the regularity of DNN functions trained with weight decay as well as the kinds of architectures weight decay induces.

Chat is not available.