Keywords: [ Active Learning ] [ Ranking and Preference Learning ] [ Online Learning / Bandits ] [ Online Learning, Active Learning, and Bandits ]

Abstract:
We consider PAC learning a good item from $k$-subsetwise feedback sampled from a Plackett-Luce probability model, with instance-dependent sample complexity performance. In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an optimal instance-dependent algorithm with a sample complexity bound for PAC best arm identification algorithm of
$O\bigg(\frac{\Theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\Theta_{[k]}$ is the sum of the Plackett-Luce parameters for top-$k$ items. The algorithm is based on a wrapper around a PAC winner-finding algorithm with weaker performance guarantees to adapt to the hardness of the input instance. The sample complexity is also shown to be multiplicatively better depending on the length of rank-ordered feedback available in each subset-wise play. We show optimality of our algorithms with matching sample complexity lower bounds. We next address the winner-finding problem in Plackett-Luce models in the fixed-budget setting with instance dependent upper and lower bounds on the misidentification probability, of $\Omega\left(\exp(-2 \tilde \Delta Q) \right)$ for a given budget $Q$, where $\tilde \Delta$ is an explicit instance-dependent problem complexity parameter. Numerical performance results are also reported for the algorithms.