Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems

Informed POMDP: Leveraging Additional Information in Model-Based RL

Gaspard Lambrechts · Adrien Bolland · Damien Ernst


Abstract:

In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the training information and the execution observation. Next, we propose an objective for learning sufficient statistics from the history for the optimal control that leverages this information. We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories. Finally, we show for the Dreamer algorithm that the convergence speed of the policies is sometimes greatly improved on several environments by using this informed environment model. Those results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.

Chat is not available.