Timezone: »
Counterfactual explanations and adversarial examples have emerged as critical research areas for addressing the explainability and robustness goals of machine learning (ML). While counterfactual explanations were developed with the goal of providing recourse to individuals adversely impacted by algorithmic decisions, adversarial examples were designed to expose the vulnerabilities of ML models. In this work, we make one of the first attempts at formalizing the connections between counterfactual explanations and adversarial examples. More specifically, we theoretically analyze salient counterfactual explanation and adversarial example generation methods, and highlight the conditions under which they behave similarly. Our analysis demonstrates that several popular counterfactual explanation and adversarial example generation methods such as the ones proposed by Wachter et. al. and Carlini and Wagner (with mean squared error loss), and C-CHVAE and natural adversarial examples by Zhao et. al. are equivalent. We also bound the distance between counterfactual explanations and adversarial examples generated by Wachter et. al. and DeepFool methods for linear models.
Author Information
Martin Pawelczyk (University of Tuebingen)
Shalmali Joshi (Harvard University (SEAS))
Chirag Agarwal (Harvard University)
Sohini Upadhyay (Harvard University)
Hima Lakkaraju (Harvard)
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
· Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations »
· Chirag Agarwal · Marinka Zitnik · Hima Lakkaraju -
2021 : What will it take to generate fairness-preserving explanations? »
· Jessica Dai · Sohini Upadhyay · Hima Lakkaraju -
2021 : Feature Attributions and Counterfactual Explanations Can Be Manipulated »
· Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Towards Robust and Reliable Algorithmic Recourse »
Sohini Upadhyay · Shalmali Joshi · Hima Lakkaraju -
2021 : CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms »
Martin Pawelczyk · Sascha Bielawski · Gjergji Kasneci -
2021 : Enhancing interpretability and reducing uncertainties in deep learning of electrocardiograms using a sub-waveform representation »
Hossein Honarvar · Chirag Agarwal · Sulaiman Somani · Girish Nadkarni · Marinka Zitnik · Fei Wang · Benjamin Glicksberg -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Interpretable learning-to-defer for sequential decision-making »
Shalmali Joshi · Sonali Parbhoo · Finale Doshi-Velez -
2021 : Towards a Unified Framework for Fair and Stable Graph Representation Learning »
Chirag Agarwal · Hima Lakkaraju · Marinka Zitnik -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Interpretable learning-to-defer for sequential decision-making »
Shalmali Joshi · Sonali Parbhoo · Finale Doshi-Velez -
2021 : On formalizing causal off-policy sequential decision-making »
Sonali Parbhoo · Shalmali Joshi · Finale Doshi-Velez -
2022 : "Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts »
Haoran Zhang · Harvineet Singh · Shalmali Joshi -
2023 Tutorial: Responsible AI for Generative AI in Practice: Lessons Learned and Open Challenges »
Hima Lakkaraju · Krishnaram Kenthapadi · Nazneen Rajani -
2022 Workshop: New Frontiers in Adversarial Machine Learning »
Sijia Liu · Pin-Yu Chen · Dongxiao Zhu · Eric Wong · Kathrin Grosse · Hima Lakkaraju · Sanmi Koyejo -
2021 Workshop: ICML Workshop on Algorithmic Recourse »
Stratis Tsirtsis · Amir-Hossein Karimi · Ana Lucic · Manuel Gomez-Rodriguez · Isabel Valera · Hima Lakkaraju -
2021 : Invited Speakers' Panel »
Neeraja J Yadwadkar · Shalmali Joshi · Roberto Bondesan · Engineer Bainomugisha · Stephen Roberts -
2021 : Towards Robust and Reliable Model Explanations for Healthcare »
Hima Lakkaraju -
2021 : Ethics of developing ML in healthcare »
Shalmali Joshi -
2021 Poster: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2020 Poster: Robust and Stable Black Box Explanations »
Hima Lakkaraju · Nino Arsov · Osbert Bastani