Skip to yearly menu bar Skip to main content

Workshop: Responsible Decision Making in Dynamic Environments

A Decision Metric for the Use of a Deep Reinforcement Learning Policy

Christina Selby · Edward Staley


Uncertainty estimation techniques such as those found in Osband et al. (2018) and Burda et al. (2019) have been shown to be useful for efficient exploration during training. This paper demonstrates that such uncertainty estimation techniques can also be used as part of a time-series based methodology for out-of-distribution (OOD) detection for an off-line model-free deep reinforcement learning policy. In particular, this paper defines a "decision metric" that can be utilized for determining when another decision-making process should be used in place of the deep reinforcement learning policy.

Chat is not available.