Skip to yearly menu bar Skip to main content


Poster

Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method

Kiran Thekumparampil · Gaurush Hiranandani · Kousha Kalantari · Shoham Sabach · Branislav Kveton

East Exhibition Hall A-B #E-1808
[ ] [ ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: We study learning human preferences from limited comparison feedback, a core machine learning problem that is at the center of reinforcement learning from human feedback (RLHF). We formulate the problem as learning a Plackett-Luce (PL) model from a limited number of $K$-subset comparisons over a universe of $N$ items, where typically $K \ll N$. Our objective is to select the $K$-subsets such that all items can be ranked with minimal mistakes within the budget. We solve the problem using the D-optimal design, which minimizes the worst-case ranking loss under the estimated PL model. All known algorithms for this problem are computationally infeasible in our setting because we consider exponentially many subsets in $K$. To address this challenge, we propose a randomized Frank-Wolfe algorithm with memoization and sparse updates that has a low $O(N^2 + K^2)$ per-iteration complexity. We analyze it and demonstrate its empirical superiority on synthetic and open-source NLP datasets.

Lay Summary:

Take any collection of objects, such as movies, books, or music. Suppose that you want to learn the preference of a person over these objects, but you can ask them only to compare a smaller subset of the objects. How would you do that in the minimum number of questions? We propose, analyze, and empirically evaluate a method that computes these questions.

Live content is unavailable. Log in and register to view live content