An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule
Touqir Sajed · Or Sheffet

Tue Jun 11th 04:30 -- 04:35 PM @ Room 102

We present a provably optimal differentially private algorithm for the stochastic multi-arm bandit problem, as opposed to the private analogue of the UCB-algorithm (Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016) which doesn't meet the recently discovered lower-bound of $\Omega \left(\frac{K\log(n)}{\epsilon} \right)$ (Shariff and Sheffet, 2018). Our construction is based on a different algorithm, Successive Elimination (Even-Dar et al., 2002), that repeatedly pulls all remaining arms until an arm is found to be suboptimal and is then eliminated. In order to devise a private analogue of Successive Elimination we visit the problem of private \emph{stopping rule}, that takes as input a stream of i.i.d samples from an unknown distribution and returns a \emph{multiplicative} $(1 \pm \alpha)$-approximation of the distribution's mean, and prove the optimality of our private stopping rule. We then leverage on the private stopping rule algorithm and present the private Successive Elimination algorithm which meets both the non-private lower bound (Lai and Robbins, 1985) and the above-mentioned private lower bound. We also compare empirically the performance of our algorithm with the private UCB algorithm.

Author Information

Touqir Sajed (University of Alberta)

Touqir Sajed is a graduate student working under Professor Or Sheffet at the University of Alberta. His research areas include Differential Privacy, Multi-Armed Bandits and Reinforcement Learning. He finished his Undergraduate studies in Computing Science at the University of Alberta.

Or Sheffet (University of Alberta)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors