Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Models of Human Feedback for AI Alignment

Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries

Xuening Feng · Zhaohui Jiang · Timo Kaufmann · Eyke Hüllermeier · Paul Weng · Yifei Zhu

[ ] [ Project Page ]
Fri 26 Jul 8 a.m. PDT — 8 a.m. PDT

Abstract:

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. Traditional methods with pairwise trajectory comparisons face challenges: trajectories with subtle differences are hard to compare, and comparisons are ordinal, limiting direct inference of preference strength. In this paper, we introduce the distinguishability query, where humans compare two pairs of trajectories and indicate which pair is easier to compare and then give preference feedback on the easier pair. This type of query directly infers preference strength and is expected to reduce cognitive load on the labeler. We also connect this query to cardinal utility and difference relations, and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results empirically demonstrates the potential of our method for faster, data-efficient learning and improved user-friendliness on RLHF benchmarks.

Chat is not available.