Timezone: »
Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.
Author Information
Ameen Ali (Tel Aviv University, Israel)
Thomas Schnake (TU Berlin)
Oliver Eberle (TU Berlin)
Grégoire Montavon (Technische Universität Berlin)
Klaus-robert Mueller (Technische Universität Berlin)
Lior Wolf (Facebook AI Research and Tel Aviv University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: XAI for Transformers: Better Explanations through Conservative Propagation »
Tue. Jul 19th 06:30 -- 06:35 PM Room Ballroom 3 & 4
More from the Same Authors
-
2023 Poster: Relevant Walk Search for Explaining Graph Neural Networks »
Ping Xiong · Thomas Schnake · Michael Gastegger · Grégoire Montavon · Klaus-robert Mueller · Shinichi Nakajima -
2023 Poster: OCD: Learning to Overfit with Conditional Diffusion Models »
Shahar Lutati · Lior Wolf -
2023 Oral: OCD: Learning to Overfit with Conditional Diffusion Models »
Shahar Lutati · Lior Wolf -
2022 Poster: Neural Inverse Kinematic »
Raphael Bensadoun · Shir Gur · Nitsan Blau · Lior Wolf -
2022 Spotlight: Neural Inverse Kinematic »
Raphael Bensadoun · Shir Gur · Nitsan Blau · Lior Wolf -
2022 Poster: Efficient Computation of Higher-Order Subgraph Attribution via Message Passing »
Ping Xiong · Thomas Schnake · Grégoire Montavon · Klaus-robert Mueller · Shinichi Nakajima -
2022 Spotlight: Efficient Computation of Higher-Order Subgraph Attribution via Message Passing »
Ping Xiong · Thomas Schnake · Grégoire Montavon · Klaus-robert Mueller · Shinichi Nakajima -
2021 : [12:52 - 01:45 PM UTC] Invited Talk 2: Toward Explainable AI »
Klaus-robert Mueller · Wojciech Samek · Grégoire Montavon -
2021 Poster: Recovering AES Keys with a Deep Cold Boot Attack »
Itamar Zimerman · Eliya Nachmani · Lior Wolf -
2021 Spotlight: Recovering AES Keys with a Deep Cold Boot Attack »
Itamar Zimerman · Eliya Nachmani · Lior Wolf -
2021 Poster: HyperHyperNetwork for the Design of Antenna Arrays »
Shahar Lutati · Lior Wolf -
2021 Spotlight: HyperHyperNetwork for the Design of Antenna Arrays »
Shahar Lutati · Lior Wolf -
2020 Workshop: XXAI: Extending Explainable AI Beyond Deep Models and Classifiers »
Wojciech Samek · Andreas HOLZINGER · Ruth Fong · Taesup Moon · Klaus-robert Mueller -
2020 Poster: Fairwashing explanations with off-manifold detergent »
Christopher Anders · Plamen Pasliev · Ann-Kathrin Dombrowski · Klaus-robert Mueller · Pan Kessel -
2020 Poster: Voice Separation with an Unknown Number of Multiple Speakers »
Eliya Nachmani · Yossi Adi · Lior Wolf -
2018 Poster: Fitting New Speakers Based on a Short Untranscribed Sample »
Eliya Nachmani · Adam Polyak · Yaniv Taigman · Lior Wolf -
2018 Oral: Fitting New Speakers Based on a Short Untranscribed Sample »
Eliya Nachmani · Adam Polyak · Yaniv Taigman · Lior Wolf -
2017 Poster: Minimizing Trust Leaks for Robust Sybil Detection »
János Höner · Shinichi Nakajima · Alexander Bauer · Klaus-robert Mueller · Nico Görnitz -
2017 Talk: Minimizing Trust Leaks for Robust Sybil Detection »
János Höner · Shinichi Nakajima · Alexander Bauer · Klaus-robert Mueller · Nico Görnitz -
2017 Poster: Learning to Align the Source Code to the Compiled Object Code »
Dor Levy · Lior Wolf -
2017 Talk: Learning to Align the Source Code to the Compiled Object Code »
Dor Levy · Lior Wolf