ICML Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL

Poster
in
Workshop: Interactive Learning with Implicit Human Feedback

Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL

Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Mengdi Wang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

In reinforcement learning (RL), a reward function is often assumed at the outset of a policy optimization procedure. Learning in such a fixed reward paradigm in RL can neglect important policy optimization considerations, such as state space coverage and safety. Moreover, it can fail to encompass broader impacts in terms of social welfare, sustainability, or market stability, potentially leading to undesirable emergent behavior and potentially misaligned policy. To mathematically encapsulate the problem of aligning RL policy optimization with such externalities, we consider a bilevel optimization problem and connect it to a principal-agent framework, where the principal specifies the broader goals and constraints of the system at the upper level and the agent solves a Markov Decision Process (MDP) at the lower level. The upper-level deals with learning a suitable reward parametrization corresponding to the broader goals and the lower-level deals with learning the policy for the agent. We propose Principal driven Policy Alignment via Bilevel RL (PPA-BRL), which efficiently aligns the policy of the agent with the principal's goals. We explicitly analyzed the dependence of the principal's trajectory on the lower-level policy, and prove the convergence of PPA-BRL to the stationary point of the problem. We illuminate the merits of this framework in view of alignment with several examples spanning energy-efficient manipulation tasks, social welfare-based tax design, and cost-effective robotic navigation.

Chat is not available.

Poster in Workshop: Interactive Learning with Implicit Human Feedback

Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL

Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Mengdi Wang

Poster
in
Workshop: Interactive Learning with Implicit Human Feedback