Poster
in
Workshop: Workshop on Reinforcement Learning Theory
Collision Resolution in Multi-player Bandits Without Observing Collision Information
Eleni Nisioti · Nikolaos Thomos · Boris Bellalta · Anders Jonsson
The absence of collision information in Multi-player Multi-armed bandits (MMABs) renders arm availabilities partially observable, impeding the design of algorithms with regret guarantees that do not allow inter-player communication. In this work, we propose a collision resolution (CR) mechanism for MMABs inspired from sequential interference mechanisms employed in communication protocols. In the general case, our collision resolution mechanism assumes that players can pull multiple arms during the exploration phase. We, thus, propose a novel MMAB model that captures this while still considering strictly bandit feedback and single-pulls during the exploitation phase. We theoretically analyze the CR mechanism using tools from information theory in order to prove the existence of an upper bound on the probability of its failure that decreases at a rate exponential in the number of players.