Skip to yearly menu bar Skip to main content


Poster

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

Heewoong Choi · Sangwon Jung · Hongjoon Ahn · Taesup Moon


Abstract:

In Reinforcement Learning (RL), designing precise reward functions remains a challenge, particularly when aligning with human intent.Preference-based RL (PbRL) addresses this by learning reward models from human feedback. However, existing PbRL methods often overlook second-order preference that indicates relative strength of preference, limiting their effectiveness. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL. LiRE constructs a Ranked List of Trajectories (RLT) using the same feedback type and budget as traditional methods but leverages second-order preference information. By sequentially comparing trajectory to the existing trajectories in the ranked list, LiRE efficiently using feedback, leading to superior reward function estimation. We validate LiRE through extensive experiments on a new offline PbRL dataset. Experimental results demonstrate effectiveness of LiRE, outperforming baselines even with modest feedback budgets. Additionally, we analyze LiRE's robustness to factors like feedback number, noise, affirming its reliability and scalability.

Live content is unavailable. Log in and register to view live content