Poster
in
Workshop: Foundations of Reinforcement Learning and Control: Connections and Perspectives

Robust Best-of-Both-Worlds Gap Estimators Based on Importance-Weighted Sampling

Sarah Clusiau · Saeed Masoudian · Yevgeny Seldin

Project Page [ OpenReview]

Abstract

We present a novel strategy for robust estimation of the gaps in multiarmed bandits that is based on importance-weighted sampling. The strategy is applicable in best-of-both-worlds setting, namely, it can be used in both stochastic and adversarial regime with no need for prior knowledge of the regime. It is based on a pair of estimators, one based on standard importance weighted sampling to upper bound the losses, and another based on importance weighted sampling with implicit exploration to lower bound the losses. We combine the strategy with the EXP3++ algorithm to achieve best-of-both-worlds regret guarantees in the stochastic and adversarial regimes, and in thestochastically constrained adversarial regime. We conjecture that the strategy can be applied more broadly to robust gap estimation in reinforcement learning, which will be studied in future work.

Chat is not available.