Decentralized Exploration in Multi-Armed Bandits
Raphael Feraud · REDA ALAMI · Romain Laroche

Thu Jun 13th 04:00 -- 04:20 PM @ Hall B

We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to insure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service, we advocate that this decentralized approach allows a good balance between conflicting interests: the providers optimize their services, while protecting privacy of users and saving resources. We define the privacy level as the amount of information an adversary could infer by intercepting all the messages concerning a single user. We provide a generic algorithm DECENTRALIZED ELIMINATION, which uses any best arm identification algorithm as a subroutine. We prove that this algorithm insures privacy, with a low communication cost, and that in comparison to the lower bound of the best arm identification problem, its sample complexity suffers from a penalty depending on the inverse of the probability of the most frequent players. Then, thanks to the genericity of the approach, we extend the proposed algorithm to the non-stationary bandits. Finally, experiments illustrate and complete the analysis.

Author Information

Raphael Feraud (Orange Labs)
REDA ALAMI (Orange Labs - Paris Saclay University - INRIA)

Réda Alami is actually a PhD candidate at Paris Saclay University (France). He is belonging to three research teams simultaneously: -1- Data Intelligence and Algorithmic: the AI research department of Orange Labs. -2- TAO team attached to INRIA Saclay https://www.inria.fr/en/teams/tao -3- SequeL team attached to INRIA Lille https://team.inria.fr/sequel/ His research interests include Machine Learning and especially Reinforcement Learning. The main goal of his thesis is to resolve the problem of the multi-armed bandit in a non-stationary environment. The non-stationary behavior of the environment is characterized with several switches observed at some unknown instants. He is interested in providing theoretical guarantees of the Bayesian Online changepoint detector proposed by Ryan Prescott Adams in 2007. Indeed, this change point detector can easily be combined with the whole family of stationary bandit solvers. We call this family Memory Bandits. The second goal of his thesis is to prove that this family is an excellent solver of the non-stationary bandit problem. During his PhD thesis, he is supervised by: -1- Odalric Maillard, a research fellow at INRIA Lille (SequeL team), France. -2- Raphael Feraud, a Senior scientist at Orange Labs Lannion, France. -3- Michele Sebag, research director at INRIA Saclay, France. During his PhD thesis, Réda has also worked on two industrial applications of the non-stationary multi-armed bandit setting: the 5G network optimization via LoRa and TSCH protocols. Before holding his PhD position, he has graduated (double Master degree in Machine Learning and Applied Mathematics) with first class honours from IMT Atlantique (Ex TELECOM Bretagne), one of the top french engineering school specialized in Machine Learning and applied mathematics. Not to mention the plethora of Machine Learning projects he conducted during his training and which resulted in eight high-level publications at seven renowned conferences (EUSIPCO 2015 - IEA/AIE 2017 - NIPS 2017 - ICT 2018 - EWRL 2018 - MSWIM 2018 - ICML 2019) and a chapter book (Springer Nature). To learn more about his papers, please visit his google scholar page: https://scholar.google.com/citations?user=2j--RT8AAAAJ&hl=en To learn more about his professionnal career, please visit his linkedIn page: https://www.linkedin.com/in/reda-alami-067304109/

Romain Laroche (Microsoft Research)

Related Events (a corresponding poster, oral, or spotlight)