Timezone: »

On Thompson Sampling with Langevin Algorithms
Eric Mazumdar · Aldo Pacchiano · Yian Ma · Michael Jordan · Peter Bartlett

Thu Jul 16 12:00 PM -- 12:45 PM & Thu Jul 16 11:00 PM -- 11:45 PM (PDT) @ None #None

Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice, though it suffers from a significant limitation computationally arising from the need for samples from posterior distributions at every iteration. To address this issue, we propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

Author Information

Eric Mazumdar (University of California Berkeley)
Aldo Pacchiano (UC Berkeley)
Yian Ma (Google)
Michael Jordan (UC Berkeley)
Peter Bartlett (UC Berkeley)

More from the Same Authors