Timezone: »
Poster
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
Jiatai Huang · Yan Dai · Longbo Huang
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $\alpha$-th ($1<\alpha\le 2$) moments bounded by $\sigma^\alpha$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $\alpha,\sigma$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(\sigma K^{1-\nicefrac 1\alpha}T^{\nicefrac{1}{\alpha}})$ minimax optimal regret even in adversarial settings, without prior knowledge on $\alpha$ and $\sigma$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $\alpha$ and $\sigma$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $\alpha$ and $\sigma$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
Author Information
Jiatai Huang (Tsinghua University)
Yan Dai (Tsinghua University)

Undergrad @ IIIS, Tsinghua. Interested in learning theory. Expected to graduate in 2024.
Longbo Huang (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits »
Tue. Jul 19th 09:25 -- 09:30 PM Room Room 310
More from the Same Authors
-
2021 : The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2023 Poster: Multi-task Representation Learning for Pure Exploration in Linear Bandits »
Yihan Du · Longbo Huang · Wen Sun -
2023 Poster: Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning »
Jiatai Huang · Yan Dai · Longbo Huang -
2023 Poster: Refined Regret for Adversarial MDPs with Linear Function Approximation »
Yan Dai · Haipeng Luo · Chen-Yu Wei · Julian Zimmert -
2022 Poster: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Spotlight: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Poster: Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation »
Pihe Hu · Yu Chen · Longbo Huang -
2022 Poster: Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) »
Yu Huang · Junyang Lin · Chang Zhou · Hongxia Yang · Longbo Huang -
2022 Spotlight: Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) »
Yu Huang · Junyang Lin · Chang Zhou · Hongxia Yang · Longbo Huang -
2022 Spotlight: Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation »
Pihe Hu · Yu Chen · Longbo Huang -
2020 Poster: Combinatorial Pure Exploration for Dueling Bandit »
Wei Chen · Yihan Du · Longbo Huang · Haoyu Zhao