Timezone: »
As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black boxes. However, existing algorithms for generating such explanations have been shown to lack stability and robustness to distribution shifts. We propose a novel framework for generating robust and stable explanations of black box models based on adversarial training. Our framework optimizes a minimax objective that aims to construct the highest fidelity explanation with respect to the worst-case over a set of adversarial perturbations. We instantiate this algorithm for explanations in the form of linear models and decision sets by devising the required optimization procedures. To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are robust to a general class of adversarial perturbations that are of practical interest. Experimental evaluation with real-world and synthetic datasets demonstrates that our approach substantially improves robustness of explanations without sacrificing their fidelity on the original data distribution.
Author Information
Hima Lakkaraju (Harvard)
Nino Arsov (Macedonian Academy of Arts and Sciences)
Osbert Bastani (University of Pennsylvania)
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
· Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations »
· Chirag Agarwal · Marinka Zitnik · Hima Lakkaraju -
2021 : What will it take to generate fairness-preserving explanations? »
· Jessica Dai · Sohini Upadhyay · Hima Lakkaraju -
2021 : Feature Attributions and Counterfactual Explanations Can Be Manipulated »
· Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards Robust and Reliable Algorithmic Recourse »
Sohini Upadhyay · Shalmali Joshi · Hima Lakkaraju -
2021 : Robust Generalization of Quadratic Neural Networks via Function Identification »
Kan Xu · Hamsa Bastani · Osbert Bastani -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Towards a Unified Framework for Fair and Stable Graph Representation Learning »
Chirag Agarwal · Hima Lakkaraju · Marinka Zitnik -
2021 : Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning »
Wanqiao Xu · Kan Xu · Hamsa Bastani · Osbert Bastani -
2021 : Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning »
Wanqiao Xu · Kan Xu · Hamsa Bastani · Osbert Bastani -
2021 : Improving Human Decision-Making with Machine Learning »
Hamsa Bastani · Osbert Bastani · Wichinpong Sinchaisri -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Improving Human Decision-Making with Machine Learning »
Hamsa Bastani · Osbert Bastani · Wichinpong Sinchaisri -
2023 : Fair Machine Unlearning: Data Removal while Mitigating Disparities »
Alex Oesterling · Jiaqi Ma · Flavio Calmon · Hima Lakkaraju -
2023 : Evaluating the Casual Reasoning Abilities of Large Language Models »
Isha Puri · Hima Lakkaraju -
2023 : TRAC: Trustworthy Retrieval Augmented Chatbot »
Shuo Li · Sangdon Park · Insup Lee · Osbert Bastani -
2023 : TRAC: Trustworthy Retrieval Augmented Chatbot »
Shuo Li · Sangdon Park · Insup Lee · Osbert Bastani -
2023 : Himabindu Lakkaraju - Regulating Explainable AI: Technical Challenges and Opportunities »
Hima Lakkaraju -
2023 : Efficient Estimation of Local Robustness of Machine Learning Models »
Tessa Han · Suraj Srinivas · Hima Lakkaraju -
2023 Poster: PAC Prediction Sets for Large Language Models of Code »
Adam Khakhar · Stephen Mell · Osbert Bastani -
2023 Poster: LIV: Language-Image Representations and Rewards for Robotic Control »
Yecheng Jason Ma · Vikash Kumar · Amy Zhang · Osbert Bastani · Dinesh Jayaraman -
2023 Poster: Robust Subtask Learning for Compositional Generalization »
Kishor Jothimurugan · Steve Hsu · Osbert Bastani · Rajeev Alur -
2023 Tutorial: Responsible AI for Generative AI in Practice: Lessons Learned and Open Challenges »
Krishnaram Kenthapadi · Hima Lakkaraju · Nazneen Rajani -
2022 : Spotlight Presentations »
Adrian Weller · Osbert Bastani · Jake Snell · Tal Schuster · Stephen Bates · Zhendong Wang · Margaux Zaffran · Danielle Rasooly · Varun Babbar -
2022 Workshop: New Frontiers in Adversarial Machine Learning »
Sijia Liu · Pin-Yu Chen · Dongxiao Zhu · Eric Wong · Kathrin Grosse · Hima Lakkaraju · Sanmi Koyejo -
2022 Poster: Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching »
Yecheng Jason Ma · Andrew Shen · Dinesh Jayaraman · Osbert Bastani -
2022 Spotlight: Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching »
Yecheng Jason Ma · Andrew Shen · Dinesh Jayaraman · Osbert Bastani -
2022 Poster: Understanding Robust Generalization in Learning Regular Languages »
Soham Dan · Osbert Bastani · Dan Roth -
2022 Spotlight: Understanding Robust Generalization in Learning Regular Languages »
Soham Dan · Osbert Bastani · Dan Roth -
2022 Poster: Sequential Covariate Shift Detection Using Classifier Two-Sample Tests »
Sooyong Jang · Sangdon Park · Insup Lee · Osbert Bastani -
2022 Spotlight: Sequential Covariate Shift Detection Using Classifier Two-Sample Tests »
Sooyong Jang · Sangdon Park · Insup Lee · Osbert Bastani -
2021 Workshop: ICML Workshop on Algorithmic Recourse »
Stratis Tsirtsis · Amir-Hossein Karimi · Ana Lucic · Manuel Gomez-Rodriguez · Isabel Valera · Hima Lakkaraju -
2021 : Towards Robust and Reliable Model Explanations for Healthcare »
Hima Lakkaraju -
2021 Poster: Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings »
Kan Xu · Xuanyi Zhao · Hamsa Bastani · Osbert Bastani -
2021 Poster: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 Spotlight: Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings »
Kan Xu · Xuanyi Zhao · Hamsa Bastani · Osbert Bastani -
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2020 Poster: Generating Programmatic Referring Expressions via Program Synthesis »
Jiani Huang · Calvin Smith · Osbert Bastani · Rishabh Singh · Aws Albarghouthi · Mayur Naik -
2019 Poster: Learning Neurosymbolic Generative Models via Program Synthesis »
Halley R Young · Osbert Bastani · Mayur Naik -
2019 Oral: Learning Neurosymbolic Generative Models via Program Synthesis »
Halley R Young · Osbert Bastani · Mayur Naik