Timezone: »
How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include black-box neural networks.
Author Information
Robert Geirhos (Google DeepMind)
Roland S. Zimmermann (University of Tübingen, MPI-IS)
Blair Bilodeau (University of Toronto)
Wieland Brendel (University of Tübingen)
Been Kim (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 : Don't trust your eyes: on the (un)reliability of feature visualizations »
Dates n/a. Room
More from the Same Authors
-
2021 : How Well do Feature Visualizations Support Causal Understanding of CNN Activations? »
· Roland S. Zimmermann · Judith Borowski · Robert Geirhos · Matthias Bethge · Thomas SA Wallis · Wieland Brendel -
2021 : Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints »
Maura Pintor · Fabio Roli · Wieland Brendel · Battista Biggio -
2022 : ImageNet-D: A new challenging robustness dataset inspired by domain adaptation »
Evgenia Rusak · Steffen Schneider · Peter V Gehler · Oliver Bringmann · Wieland Brendel · Matthias Bethge -
2023 : TabCBM: Concept-based Interpretable Neural Networks for Tabular Data »
Mateo Espinosa Zarlenga · Zohreh Shams · Michael Nelson · Been Kim · Mateja Jamnik -
2023 : Desiderata for Representation Learning from Identifiability, Disentanglement, and Group-Structuredness »
Hamza Keurti · Patrik Reizinger · Bernhard Schölkopf · Wieland Brendel -
2023 Poster: Provably Learning Object-Centric Representations »
Jack Brady · Roland S. Zimmermann · Yash Sharma · Bernhard Schölkopf · Julius von Kügelgen · Wieland Brendel -
2023 Poster: On the Relationship Between Explanation and Prediction: A Causal View »
Amir-Hossein Karimi · Krikamol Muandet · Simon Kornblith · Bernhard Schölkopf · Been Kim -
2023 Poster: Scaling Vision Transformers to 22 Billion Parameters »
Mostafa Dehghani · Josip Djolonga · Basil Mustafa · Piotr Padlewski · Jonathan Heek · Justin Gilmer · Andreas Steiner · Mathilde Caron · Robert Geirhos · Ibrahim Alabdulmohsin · Rodolphe Jenatton · Lucas Beyer · Michael Tschannen · Anurag Arnab · Xiao Wang · Carlos Riquelme · Matthias Minderer · Joan Puigcerver · Utku Evci · Manoj Kumar · Sjoerd van Steenkiste · Gamaleldin Elsayed · Aravindh Mahendran · Fisher Yu · Avital Oliver · Fantine Huot · Jasmijn Bastings · Mark Collier · Alexey Gritsenko · Vighnesh N Birodkar · Cristina Vasconcelos · Yi Tay · Thomas Mensink · Alexander Kolesnikov · Filip Pavetic · Dustin Tran · Thomas Kipf · Mario Lucic · Xiaohua Zhai · Daniel Keysers · Jeremiah Harmsen · Neil Houlsby -
2023 Oral: Provably Learning Object-Centric Representations »
Jack Brady · Roland S. Zimmermann · Yash Sharma · Bernhard Schölkopf · Julius von Kügelgen · Wieland Brendel -
2023 Oral: Scaling Vision Transformers to 22 Billion Parameters »
Mostafa Dehghani · Josip Djolonga · Basil Mustafa · Piotr Padlewski · Jonathan Heek · Justin Gilmer · Andreas Steiner · Mathilde Caron · Robert Geirhos · Ibrahim Alabdulmohsin · Rodolphe Jenatton · Lucas Beyer · Michael Tschannen · Anurag Arnab · Xiao Wang · Carlos Riquelme · Matthias Minderer · Joan Puigcerver · Utku Evci · Manoj Kumar · Sjoerd van Steenkiste · Gamaleldin Elsayed · Aravindh Mahendran · Fisher Yu · Avital Oliver · Fantine Huot · Jasmijn Bastings · Mark Collier · Alexey Gritsenko · Vighnesh N Birodkar · Cristina Vasconcelos · Yi Tay · Thomas Mensink · Alexander Kolesnikov · Filip Pavetic · Dustin Tran · Thomas Kipf · Mario Lucic · Xiaohua Zhai · Daniel Keysers · Jeremiah Harmsen · Neil Houlsby -
2022 : ImageNet-D: A new challenging robustness dataset inspired by domain adaptation »
Evgenia Rusak · Steffen Schneider · Peter V Gehler · Oliver Bringmann · Wieland Brendel · Matthias Bethge -
2022 Workshop: Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet »
Roland S. Zimmermann · Julian Bitterwolf · Evgenia Rusak · Steffen Schneider · Matthias Bethge · Wieland Brendel · Matthias Hein -
2021 Poster: Contrastive Learning Inverts the Data Generating Process »
Roland S. Zimmermann · Yash Sharma · Steffen Schneider · Matthias Bethge · Wieland Brendel -
2021 Spotlight: Contrastive Learning Inverts the Data Generating Process »
Roland S. Zimmermann · Yash Sharma · Steffen Schneider · Matthias Bethge · Wieland Brendel -
2020 Poster: Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance »
Blair Bilodeau · Dylan Foster · Daniel Roy -
2017 Workshop: Workshop on Human Interpretability in Machine Learning (WHI) »
Kush Varshney · Adrian Weller · Been Kim · Dmitry Malioutov -
2017 Tutorial: Interpretable Machine Learning »
Been Kim · Finale Doshi-Velez