Oral
On the Generalization Gap in Reparameterizable Reinforcement Learning
Huan Wang · Stephan Zheng · Caiming Xiong · Richard Socher

Wed Jun 12th 02:25 -- 02:30 PM @ Hall B

Understanding generalization in reinforcement learning (RL) is a significant challenge, as many common assumptions of traditional supervised learning theory do not apply. We argue that the gap between training and testing performance of RL agents is caused by two types of errors: intrinsic error due to the randomness of the environment and an agent's policy, and external error by the change of environment distribution. We focus on the special class of reparameterizable RL problems, where the trajectory distribution can be decomposed using the reparametrization trick. For this problem class, estimating the expected reward is efficient and does not require costly trajectory re-sampling. This enables us to study reparametrizable RL using supervised learning and transfer learning theory. Our bound suggests the generalization capability of reparameterizable RL is related to multiple factors including smoothness" of the environment transition, reward and agent policy function class. We also empirically verify the relationship between the generalization gap and these factors through simulations.