Poster
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
Heewoong Choi · Sangwon Jung · Hongjoon Ahn · Taesup Moon
In Reinforcement Learning (RL), designing precise reward functions remains a challenge, particularly when aligning with human intent.Preference-based RL (PbRL) addresses this by learning reward models from human feedback. However, existing PbRL methods often overlook second-order preference that indicates relative strength of preference, limiting their effectiveness. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL. LiRE constructs a Ranked List of Trajectories (RLT) using the same feedback type and budget as traditional methods but leverages second-order preference information. By sequentially comparing trajectory to the existing trajectories in the ranked list, LiRE efficiently using feedback, leading to superior reward function estimation. We validate LiRE through extensive experiments on a new offline PbRL dataset. Experimental results demonstrate effectiveness of LiRE, outperforming baselines even with modest feedback budgets. Additionally, we analyze LiRE's robustness to factors like feedback number, noise, affirming its reliability and scalability.
Live content is unavailable. Log in and register to view live content