Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Reinforcement Learning for Real Life

A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

Tianchi Cai · Wenpeng Zhang · Lihong Gu · Xiaodong Zeng · Jinjie Gu


Abstract:

Although well-established in general reinforcement learning (RL), value-based methods are rarely explored in constrained RL (CRL) for their incapability of finding policies that can randomize among multiple actions. To apply value-based methods to CRL, a recent groundbreaking line of game-theoretic approaches uses the mixed policy that randomizes among a set of carefully generated policies to converge to the desired constraint-satisfying policy. However, these approaches require storing a large set of policies, which is not policy efficient, and may incur prohibitive memory costs in large-scale applications. To address this problem, we propose an alternative approach. Our approach first reformulates the CRL problem to an equivalent distance optimization problem. With a specially designed linear optimization oracle, we derive a meta-algorithm that solves it using any off-the-shelf RL algorithm and any conditional gradient (CG) type algorithm as subroutines. We then propose a new variant of the CG-type algorithm, which generalizes the minimum norm point (MNP) method. The proposed method matches the convergence rate of the existing game-theoretic approaches and achieves the worst-case optimal policy efficiency. The experiments on a navigation task show that our method reduces the memory costs by an order of magnitude, and meanwhile achieves better performance, demonstrating both its effectiveness and efficiency.

Chat is not available.