Skip to yearly menu bar Skip to main content


Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback

Tiancheng Jin ⋅ Tal Lancewicki ⋅ Haipeng Luo ⋅ Yishay Mansour ⋅ Aviv Rosenberg

Abstract

Video

Chat is not available.