Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The Many Facets of Preference-Based Learning

Optimistic Thompson Sampling for No-Regret Learning in Unknown Games

Yingru Li · Liangqi LIU · Wenqiang Pu · Zhi-Quan Luo


Abstract:
We propose an Optimism-then-NoRegret learning framework for learning to play a repeated multiplayer game with an unknown reward function and bandit feedback. Our framework encompasses various game algorithms as special cases. It consists of an estimation step for constructing an imagined reward and a no-regret step for playing against an adversary.    Thompson Sampling (TS) can be naturally included in the framework, but its effectiveness in this context remains unclear. We demonstrate that TS fails in a class of unknown games. To address this, we propose an optimistic variant of TS combined with suitable full-information adversarial bandit algorithms, achieving sublinear regret in the unknown game. We establish an information-theoretic regret bound for the proposed algorithms.    Our analysis highlights that the optimistic variant encourages more exploration than classical TS in unknown games. We evaluate the algorithms on random matrix games and two real-world applications: radar anti-jamming and traffic routing problems. The proposed algorithms outperform baselines substantially.

Chat is not available.