Poster
in
Workshop: Neural Compression: From Information Theory to Applications
Slicing Mutual Information Generalization Bounds for Neural Networks
Kimia Nadjahi · Kristjan Greenewald · Rickard Gabrielsson · Justin Solomon
Abstract:
The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e. the MI between the training data and the learned hypothesis. These bounds have limited empirical use for modern ML applications (e.g. deep learning) since the evaluation of MI is difficult in high-dimensional settings. Motivated by recent reports of significant low-loss compressibility of neural networks, we study the generalization capacity of algorithms which *slice* the parameter space, i.e. train on a random lower-dimensional subspace. We derive information-theoretic bounds on the generalization error in this regime, and discuss an intriguing connection to the $k$-Sliced Mutual Information, an alternative measure of statistical dependence which scales well with dimension. The computational and statistical benefits of our approach allow us to empirically estimate the input-output information of these neural networks and compute their information-theoretic generalization bounds, a task which was previously out of reach.
Chat is not available.