Timezone: »
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real-world datasets.
Author Information
Sushant Agarwal (University of Waterloo)
Shahin Jabbari (Harvard University)
Chirag Agarwal (Harvard University)
Sohini Upadhyay (Harvard University)
Steven Wu (Carnegie Mellon University)
Hima Lakkaraju (Harvard)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Wed. Jul 21st 12:40 -- 12:45 PM Room
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
· Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations »
· Chirag Agarwal · Marinka Zitnik · Hima Lakkaraju -
2021 : What will it take to generate fairness-preserving explanations? »
· Jessica Dai · Sohini Upadhyay · Hima Lakkaraju -
2021 : Feature Attributions and Counterfactual Explanations Can Be Manipulated »
· Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards Robust and Reliable Algorithmic Recourse »
Sohini Upadhyay · Shalmali Joshi · Hima Lakkaraju -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : Private Multi-Task Learning: Formulation and Applications to Federated Learning »
Shengyuan Hu · Steven Wu · Virginia Smith -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : Understanding Clipped FedAvg: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Steven Wu · Mingyi Hong -
2021 : Improved Privacy Filters and Odometers: Time-Uniform Bounds in Privacy Composition »
Justin Whitehouse · Aaditya Ramdas · Ryan Rogers · Steven Wu -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 : Towards a Unified Framework for Fair and Stable Graph Representation Learning »
Chirag Agarwal · Hima Lakkaraju · Marinka Zitnik -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Scalable Algorithms for Nonlinear Causal Inference »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 : Meta-Learning Adversarial Bandits »
Nina Balcan · Keegan Harris · Mikhail Khodak · Steven Wu -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Towards Fair Knowledge Distillation using Student Feedback »
Abhinav Java · Surgan Jandial · Chirag Agarwal -
2023 : Adaptive Principal Component Regression with Applications to Panel Data »
Anish Agarwal · Keegan Harris · Justin Whitehouse · Steven Wu -
2023 : Strategyproof Decision-Making in Panel Data Settings and Beyond »
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu -
2023 : Counterfactual Explanation Policies in RL »
Shripad Deshmukh · Srivatsan R · Supriti Vijay · Jayakumar Subramanian · Chirag Agarwal -
2023 : Fair Machine Unlearning: Data Removal while Mitigating Disparities »
Alex Oesterling · Jiaqi Ma · Flavio Calmon · Hima Lakkaraju -
2023 : Evaluating the Casual Reasoning Abilities of Large Language Models »
Isha Puri · Hima Lakkaraju -
2023 : Towards Fair Knowledge Distillation using Student Feedback »
Abhinav Java · Surgan Jandial · Chirag Agarwal -
2023 : Strategic Apple Tasting »
Keegan Harris · Chara Podimata · Steven Wu -
2023 : Strategyproof Decision-Making in Panel Data Settings and Beyond »
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 : Strategic Apple Tasting »
Keegan Harris · Chara Podimata · Steven Wu -
2023 : Himabindu Lakkaraju - Regulating Explainable AI: Technical Challenges and Opportunities »
Hima Lakkaraju -
2023 : Efficient Estimation of Local Robustness of Machine Learning Models »
Tessa Han · Suraj Srinivas · Hima Lakkaraju -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 Poster: Fully-Adaptive Composition in Differential Privacy »
Justin Whitehouse · Aaditya Ramdas · Ryan Rogers · Steven Wu -
2023 Oral: Nonparametric Extensions of Randomized Response for Private Confidence Sets »
Ian Waudby-Smith · Steven Wu · Aaditya Ramdas -
2023 Poster: Nonparametric Extensions of Randomized Response for Private Confidence Sets »
Ian Waudby-Smith · Steven Wu · Aaditya Ramdas -
2023 Poster: Inverse Reinforcement Learning without Reinforcement Learning »
Gokul Swamy · David Wu · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 Poster: Generating Private Synthetic Data with Genetic Algorithms »
Terrance Liu · Jingwu Tang · Giuseppe Vietri · Steven Wu -
2023 Tutorial: Responsible AI for Generative AI in Practice: Lessons Learned and Open Challenges »
Krishnaram Kenthapadi · Hima Lakkaraju · Nazneen Rajani -
2022 Workshop: New Frontiers in Adversarial Machine Learning »
Sijia Liu · Pin-Yu Chen · Dongxiao Zhu · Eric Wong · Kathrin Grosse · Hima Lakkaraju · Sanmi Koyejo -
2022 Poster: Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2022 Poster: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Zuxin Liu · Zhepeng Cen · Vladislav Isenbaev · Wei Liu · Steven Wu · Bo Li · Ding Zhao -
2022 Poster: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Spotlight: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Zuxin Liu · Zhepeng Cen · Vladislav Isenbaev · Wei Liu · Steven Wu · Bo Li · Ding Zhao -
2022 Spotlight: Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2022 Oral: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Poster: Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Improved Regret for Differentially Private Exploration in Linear MDP »
Dung Ngo · Giuseppe Vietri · Steven Wu -
2022 Poster: Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Mingyi Hong · Steven Wu · Jinfeng Yi -
2022 Spotlight: Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Mingyi Hong · Steven Wu · Jinfeng Yi -
2022 Spotlight: Improved Regret for Differentially Private Exploration in Linear MDP »
Dung Ngo · Giuseppe Vietri · Steven Wu -
2022 Spotlight: Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Social: Trustworthy Machine Learning Social »
Haohan Wang · Sarah Tan · Chirag Agarwal · Chhavi Yadav · Jaydeep Borkar -
2021 Workshop: ICML Workshop on Algorithmic Recourse »
Stratis Tsirtsis · Amir-Hossein Karimi · Ana Lucic · Manuel Gomez-Rodriguez · Isabel Valera · Hima Lakkaraju -
2021 : Towards Robust and Reliable Model Explanations for Healthcare »
Hima Lakkaraju -
2021 Poster: Leveraging Public Data for Practical Private Query Release »
Terrance Liu · Giuseppe Vietri · Thomas Steinke · Jonathan Ullman · Steven Wu -
2021 Spotlight: Leveraging Public Data for Practical Private Query Release »
Terrance Liu · Giuseppe Vietri · Thomas Steinke · Jonathan Ullman · Steven Wu -
2021 Poster: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Spotlight: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Poster: Incentivizing Compliance with Algorithmic Instruments »
Dung Ngo · Logan Stapleton · Vasilis Syrgkanis · Steven Wu -
2021 Spotlight: Incentivizing Compliance with Algorithmic Instruments »
Dung Ngo · Logan Stapleton · Vasilis Syrgkanis · Steven Wu -
2020 Poster: Robust and Stable Black Box Explanations »
Hima Lakkaraju · Nino Arsov · Osbert Bastani