Workshop: Complex feedback in online learning

Unimodal Mono-Partite Matching in a Bandit Setting

Matthieu Rodet · Romaric Gaudel


We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by (Sentenac et al., 2021), creating an algorithm with an expected regret matching O(L log(L)/∆ log(T)) with 2L players, T iterations and a minimum reward gap ∆. We reduce this bound in two steps. First, as in (Gauthier et al.,2021) and (Gauthier et al., 2022) we use the uni-modality property of the expected reward on the appropriate graph to design an algorithm with a regret in O(L /∆ log(T)). Secondly, we show that by moving the focus towards the main question ‘Is user i better than user j?’ this regret becomes O(L ∆ / D^2 log(T)), where D>∆ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.

Chat is not available.