Timezone: »

High Confidence Generalization for Reinforcement Learning
James Kostas · Yash Chandak · Scott Jordan · Georgios Theocharous · Philip Thomas

Tue Jul 20 09:00 PM -- 11:00 PM (PDT) @

We present several classes of reinforcement learning algorithms that safely generalize to Markov decision processes (MDPs) not seen during training. Specifically, we study the setting in which some set of MDPs is accessible for training. The goal is to generalize safely to MDPs that are sampled from the same distribution, but which may not be in the set accessible for training. For various definitions of safety, our algorithms give probabilistic guarantees that agents can safely generalize to MDPs that are sampled from the same distribution but are not necessarily in the training set. These algorithms are a type of Seldonian algorithm (Thomas et al., 2019), which is a class of machine learning algorithms that return models with probabilistic safety guarantees for user-specified definitions of safety.

Author Information

James Kostas (University of Massachusetts Amherst)
Yash Chandak (University of Massachusetts Amherst)
Scott Jordan (University of Massachusetts)
Georgios Theocharous (Adobe Research)
Philip Thomas (University of Massachusetts Amherst)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors