Spotlight

Greedy when Sure and Conservative when Uncertain about the Opponents

Haobo Fu ⋅ Ye Tian ⋅ Hongxiang Yu ⋅ Weiming Liu ⋅ Shuang Wu ⋅ Jiechao Xiong ⋅ Ying Wen ⋅ Kai Li ⋅ Junliang Xing ⋅ Qiang Fu ⋅ Wei Yang

Keywords: PM: Bayesian Models and Methods RL: Online RL: Multi-agent PM: Variational Inference

2022 Spotlight

[ Slides] [ Paper PDF]

Abstract

We develop a new approach, named Greedy when Sure and Conservative when Uncertain (GSCU), to competing online against unknown and nonstationary opponents. GSCU improves in four aspects: 1) introduces a novel way of learning opponent policy embeddings offline; 2) trains offline a single best response (conditional additionally on our opponent policy embedding) instead of a finite set of separate best responses against any opponent; 3) computes online a posterior of the current opponent policy embedding, without making the discrete and ineffective decision which type the current opponent belongs to; and 4) selects online between a real-time greedy policy and a fixed conservative policy via an adversarial bandit algorithm, gaining a theoretically better regret than adhering to either. Experimental studies on popular benchmarks demonstrate GSCU's superiority over the state-of-the-art methods. The code is available online at \url{https://github.com/YeTianJHU/GSCU}.

Video

Chat is not available.