Timezone: »
As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between COUNTERPOL and widely used trust region-based policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while keeping close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and developing counterfactual policies.
Author Information
Shripad Deshmukh
Srivatsan R (Indian Institute of Technology, Madras)
Supriti Vijay (Manipal Institute Of Technology)
Jayakumar Subramanian (Adobe)
Chirag Agarwal (Harvard University)
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations »
· Chirag Agarwal · Marinka Zitnik · Hima Lakkaraju -
2021 : Towards a Unified Framework for Fair and Stable Graph Representation Learning »
Chirag Agarwal · Hima Lakkaraju · Marinka Zitnik -
2023 : Towards Fair Knowledge Distillation using Student Feedback »
Abhinav Java · Surgan Jandial · Chirag Agarwal -
2023 : Towards Fair Knowledge Distillation using Student Feedback »
Abhinav Java · Surgan Jandial · Chirag Agarwal -
2022 Social: Trustworthy Machine Learning Social »
Haohan Wang · Sarah Tan · Chirag Agarwal · Chhavi Yadav · Jaydeep Borkar -
2021 Poster: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju