Timezone: »

Smooth Non-stationary Bandits
Su Jia · Qian Xie · Nathan Kallus · Peter Frazier

Thu Jul 27 01:30 PM -- 03:00 PM (PDT) @ Exhibit Hall 1 #327
In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $T^{2/3}$ regret. However, in practice environments are often changing *smoothly*, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the *rate of change*. In this paper, we study a non-stationary two-arm bandit problem where we assume an arm's mean reward is a $\beta$-Hölder function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first *separation* between the smooth and non-smooth regimes by presenting a policy with $T^{3/5}$ regret for $\beta=2$. We complement this result by a $T^{\frac{\beta+1}{2\beta+1}}$ lower bound for any integer $\beta\ge 1$, which matches our upper bound for $\beta=2$.

Author Information

Su Jia
Qian Xie
Nathan Kallus (Cornell University)
Peter Frazier (Cornell University / Uber)

Peter Frazier is an Associate Professor in the School of Operations Research and Information Engineering at Cornell University. He is also a Staff Data Scientist at Uber, where he managed the data science group for UberPOOL while on sabbatical leave from Cornell. He completed his Ph.D. in Operations Research and Financial Engineering at Princeton University in 2009. Peter's research is in Bayesian optimization, multi-armed bandits and incentive design for social learning, with applications in e-commerce, the sharing economy, and materials design. He is the recipient of an AFOSR Young Investigator Award and an NSF CAREER Award.

More from the Same Authors