Timezone: »

VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path
Romina Abachi · Claas Voelcker · Animesh Garg · Amir-massoud Farahmand
Event URL: https://openreview.net/forum?id=qLuxVmnB7Gg »

We propose a practical and generalizable Decision-Aware Model-Based Reinforcement Learning algorithm. We extend the frameworks of VAML (Farahmand et al., 2017) and IterVAML (Farahmand, 2018), which have been shown to be difficult to scale to high-dimensional and continuous environments (Lovatto et al., 2020a; Modhe et al., 2021; Voelcker et al., 2022). We propose to use the notion of the Value Improvement Path (Dabney et al., 2020) to improve the generalization of VAML-like model learning. We show theoretically for linear and tabular spaces that our proposed algorithm is sensible, justifying extension to non-linear and continuous spaces. We also present a detailed implementation proposal based on these ideas.

Author Information

Romina Abachi (Department of Computer Science, University of Toronto)
Claas Voelcker (University of Toronto)
Animesh Garg (University of Toronto, Vector Institute, Nvidia)
Amir-massoud Farahmand (Vector Institute & University of Toronto)

More from the Same Authors