Timezone: »
We study the estimation of the mutual information I(X;Tℓ) between the input X to a deep neural network (DNN) and the output vector Tℓ of its ℓ-th hidden layer (an “internal representation”). Focusing on feedforward networks with fixed weights and noisy internal representations, we develop a rigorous framework for accurate estimation of I(X;Tℓ). By relating I(X;Tℓ) to information transmission over additive white Gaussian noise channels, we reveal that compression, i.e. reduction in I(X;Tℓ) over the course of training, is driven by progressive geometric clustering of the representations of samples from the same class. Experimental results verify this connection. Finally, we shift focus to purely deterministic DNNs, where I(X;Tℓ) is provably vacuous, and show that nevertheless, these models also cluster inputs belonging to the same class. The binning-based approximation of I(X;T_ℓ) employed in past works to measure compression is identified as a measure of clustering, thus clarifying that these experiments were in fact tracking the same clustering phenomenon. Leveraging the clustering perspective, we provide new evidence that compression and generalization may not be causally related and discuss potential future research ideas.
Author Information
Ziv Goldfeld (MIT)
Ewout van den Berg (IBM)
Kristjan Greenewald (IBM)
Igor Melnyk (IBM)
Nam Nguyen (IBM Research AI)
Brian Kingsbury (IBM Research)
Yury Polyanskiy (MIT)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Estimating Information Flow in Deep Neural Networks »
Thu Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2019 Poster: Bayesian Nonparametric Federated Learning of Neural Networks »
Mikhail Yurochkin · Mayank Agarwal · Soumya Ghosh · Kristjan Greenewald · Nghia Hoang · Yasaman Khazaeni -
2019 Oral: Bayesian Nonparametric Federated Learning of Neural Networks »
Mikhail Yurochkin · Mayank Agarwal · Soumya Ghosh · Kristjan Greenewald · Nghia Hoang · Yasaman Khazaeni -
2019 Poster: Beyond Backprop: Online Alternating Minimization with Auxiliary Variables »
Anna Choromanska · Benjamin Cowen · Sadhana Kumaravel · Ronny Luss · Mattia Rigotti · Irina Rish · Paolo DiAchille · Viatcheslav Gurev · Brian Kingsbury · Ravi Tejwani · Djallel Bouneffouf -
2019 Oral: Beyond Backprop: Online Alternating Minimization with Auxiliary Variables »
Anna Choromanska · Benjamin Cowen · Sadhana Kumaravel · Ronny Luss · Mattia Rigotti · Irina Rish · Paolo DiAchille · Viatcheslav Gurev · Brian Kingsbury · Ravi Tejwani · Djallel Bouneffouf