Spotlight

Constrained Offline Policy Optimization

Nicholas Polosky ⋅ Bruno C. da Silva ⋅ Madalina Fiterau ⋅ Jithin Jagannath

Keywords: RL: Batch/Offline RL: Policy Search

2022 Spotlight

[ Slides] [ Paper PDF]

Abstract

In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. COPO is built upon a novel offline cost-projection method, which we formally derive and analyze. Our method improves upon the state-of-the-art in offline constrained policy optimization by explicitly accounting for distributional shift and by offering non-asymptotic confidence bounds on the cost of a policy. These formal properties are superior to those of existing techniques, which only guarantee convergence to a point estimate. We formally analyze our method and empirically demonstrate that it achieves state-of-the-art performance on discrete and continuous control problems, while offering the aforementioned improved, stronger, and more robust theoretical guarantees.

Video

Chat is not available.