Timezone: »
Poster
Smooth Non-stationary Bandits
Su Jia · Qian Xie · Nathan Kallus · Peter Frazier
In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $T^{2/3}$ regret. However, in practice environments are often changing *smoothly*, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the *rate of change*. In this paper, we study a non-stationary two-arm bandit problem where we assume an arm's mean reward is a $\beta$-Hölder function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first *separation* between the smooth and non-smooth regimes by presenting a policy with $T^{3/5}$ regret for $\beta=2$. We complement this result by a $T^{\frac{\beta+1}{2\beta+1}}$ lower bound for any integer $\beta\ge 1$, which matches our upper bound for $\beta=2$.
Author Information
Su Jia
Qian Xie
Nathan Kallus (Cornell University)
Peter Frazier (Cornell University / Uber)
Peter Frazier is an Associate Professor in the School of Operations Research and Information Engineering at Cornell University. He is also a Staff Data Scientist at Uber, where he managed the data science group for UberPOOL while on sabbatical leave from Cornell. He completed his Ph.D. in Operations Research and Financial Engineering at Princeton University in 2009. Peter's research is in Bayesian optimization, multi-armed bandits and incentive design for social learning, with applications in e-commerce, the sharing economy, and materials design. He is the recipient of an AFOSR Young Investigator Award and an NSF CAREER Award.
More from the Same Authors
-
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 Poster: Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR »
Kaiwen Wang · Nathan Kallus · Wen Sun -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Poster: B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding »
Miruna Oprescu · Jacob Dorn · Marah Ghoummaid · Andrew Jesson · Nathan Kallus · Uri Shalit -
2023 Poster: Short-lived High-volume Bandits »
Su Jia · Nishant Oli · Ian Anderson · Paul Duff · Andrew Li · R. Ravi -
2022 : Peter Frazier »
Peter Frazier -
2022 Poster: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Poster: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Spotlight: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Oral: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2021 Poster: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2021 Spotlight: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2020 Poster: Statistically Efficient Off-Policy Policy Gradients »
Nathan Kallus · Masatoshi Uehara -
2020 Poster: DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training »
Nathan Kallus -
2020 Poster: Efficient Policy Learning from Surrogate-Loss Classification Reductions »
Andrew Bennett · Nathan Kallus -
2020 Poster: Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation »
Nathan Kallus · Masatoshi Uehara -
2019 : Keynote by Peter Frazier: Grey-box Bayesian Optimization for AutoML »
Peter Frazier -
2019 Poster: Bayesian Optimization of Composite Functions »
Raul Astudillo · Peter Frazier -
2019 Poster: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2019 Oral: Bayesian Optimization of Composite Functions »
Raul Astudillo · Peter Frazier -
2019 Oral: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2018 Poster: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2018 Oral: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2017 Poster: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus -
2017 Talk: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus -
2017 Poster: Dueling Bandits with Weak Regret »
Bangrui Chen · Peter Frazier -
2017 Talk: Dueling Bandits with Weak Regret »
Bangrui Chen · Peter Frazier