Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning
Near Optimal Adversarial Attack on UCB Bandits
Keywords: [ Bandit Algorithms ] [ Adversarial Attacks ]
Abstract:
I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the upper-confidence-bound (UCB) algorithm into pulling some non-optimal target arm times with a cumulative cost that scales as , where is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to factors, showing the proposed attack strategy to be near optimal.
Chat is not available.