Skip to yearly menu bar Skip to main content


Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Near Optimal Adversarial Attack on UCB Bandits

Keywords: [ Adversarial Attacks ] [ Bandit Algorithms ]


Abstract: I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the upper-confidence-bound (UCB) algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\hat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.

Chat is not available.