Skip to yearly menu bar Skip to main content


Poster

Meta Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-context Learning

Ye Xu · Zihao Li · Qinyuan Ren


Abstract:

A key challenge in Meta Reinforcement Learning (RL) is the task distribution shift, since the generalization ability of most current meta RL methods is limited to tasks sampled from the training distribution. In this paper, we propose Posterior Sampling Bayesian Lifelong In-Context Reinforcement Learning (PSBL), which is robust to the task distribution shift. PSBL meta-trains a variant of transformer to directly perform amortized inference about the Predictive Posterior Distribution (PPD) of the optimal policy. Once trained, the network can infer the PPD online with frozen parameters.Then the agent samples actions from the approximate PPD to perform online exploration, thereby progressively reducing uncertainty and improving its performance even in Out-of-Distribution (OOD) tasks. Such property is known as ‘In-context Learning’. Experimental results demonstrate that PSBL significantly outperforms standard Meta RL methods both in tasks with sparse rewards and dense rewards when the test task distribution is strictly shifted from the training distribution.

Live content is unavailable. Log in and register to view live content