Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Sampling and Optimization in Discrete Space

Finite-state Offline Reinforcement Learning with Moment-based Bayesian Epistemic and Aleatoric Uncertainties

Filippo Valdettaro · Aldo Faisal


Abstract:

Reinforcement learning (RL) agents can learn complex sequential decision-making and control strategies, often above human expert performance levels. In real-world deployment, it becomes essential from a risk, safety-critical, and human interaction perspective for agents to communicate the degree of confidence or uncertainty they have in the outcomes of their actions and account for it in their decision-making. We assemble here a complete pipeline for modelling uncertainty in the finite, discrete-state setting of offline RL.First, we use methods from Bayesian RL to capture the posterior uncertainty in environment model parameters given the available data. Next, we determine exact values for the return distribution's standard deviation, taken as the measure of uncertainty, for given samples from the environment posterior (without requiring quantile-based or similar approximations of conventional distributional RL) to more efficiently decompose the agent's uncertainty into epistemic and aleatoric uncertainties compared to previous approaches.This allows us to build an RL agent that quantifies both types of uncertainty and utilises its epistemic uncertainty belief to inform its optimal policy through a novel stochastic gradient-based optimisation process.We illustrate the improved uncertainty quantification and Bayesian value optimisation performance of our agent in simple, interpretable gridworlds and confirm its scalability by applying it to a clinical decision support system (AI Clinician) which makes real-time recommendations for sepsis treatment in intensive care units, and address the limitations that arise with inference for larger-scale MDPs by proposing a sparse, conservative dynamics model.

Chat is not available.