Skip to yearly menu bar Skip to main content


Kernel-Based Reinforcement Learning in Robust Markov Decision Processes

Shiau Hong Lim · Arnaud Autef

Pacific Ballroom #120

Keywords: [ Theory and Algorithms ] [ Safety ] [ Robust Statistics and Machine Learning ]


The robust Markov decision processes (MDP) framework aims to address the problem of parameter uncertainty due to model mismatch, approximation errors or even adversarial behaviors. It is especially relevant when deploying the learned policies in real-world applications. Scaling up the robust MDP framework to large or continuous state space remains a challenging problem. The use of function approximation in this case is usually inevitable and this can only amplify the problem of model mismatch and parameter uncertainties. It has been previously shown that, in the case of MDPs with state aggregation, the robust policies enjoy a tighter performance bound compared to standard solutions due to its reduced sensitivity to approximation errors. We extend these results to the much larger class of kernel-based approximators and show, both analytically and empirically that the robust policies can significantly outperform the non-robust counterpart.

Live content is unavailable. Log in and register to view live content