ICML Near Optimal Adversarial Attack on UCB Bandits

Poster
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Near Optimal Adversarial Attack on UCB Bandits

Shiliang Zuo

Keywords: [ Bandit Algorithms ] [ Adversarial Attacks ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. I propose a novel attack strategy that manipulates a learner employing the upper-confidence-bound (UCB) algorithm into pulling some non-optimal target arm

T - o (T)

$T - o(T)$ times with a cumulative cost that scales as

\hat{O} (\sqrt{\log T})

$\widehat{O}(\sqrt{\log T})$ , where

T

$T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to

O (\log \log T)

$O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.

Chat is not available.

Poster in Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Near Optimal Adversarial Attack on UCB Bandits

Shiliang Zuo

Poster
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning