Workshop: XXAI: Extending Explainable AI Beyond Deep Models and Classifiers
Contributed Talk 4: Yau et al. - What did you think would happen? Explaining Agent Behaviour through Intended Outcomes
We present a novel form of explanation for Reinforcement Learning (RL), based around the notion of intended outcome. This describes what outcome an agent is trying to achieve by its actions. Given this definition, we provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning. Rather, the information needed for the explanations must be collected in conjunction with training the agent. We provide approaches designed to do this for several variants of Q-function approximation and prove consistency between the explanations and the Q-values learned. We demonstrate our method on multiple reinforcement learning problems.