Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Collision Resolution in Multi-player Bandits Without Observing Collision Information

Eleni Nisioti · Nikolaos Thomos · Boris Bellalta · Anders Jonsson


The absence of collision information in Multi-player Multi-armed bandits (MMABs) renders arm availabilities partially observable, impeding the design of algorithms with regret guarantees that do not allow inter-player communication. In this work, we propose a collision resolution (CR) mechanism for MMABs inspired from sequential interference mechanisms employed in communication protocols. In the general case, our collision resolution mechanism assumes that players can pull multiple arms during the exploration phase. We, thus, propose a novel MMAB model that captures this while still considering strictly bandit feedback and single-pulls during the exploitation phase. We theoretically analyze the CR mechanism using tools from information theory in order to prove the existence of an upper bound on the probability of its failure that decreases at a rate exponential in the number of players.

Chat is not available.