Timezone: »
Poster
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR
Kaiwen Wang · Nathan Kallus · Wen Sun
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $\Omega(\sqrt{\tau^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of $\Omega(\sqrt{\tau^{-1}SAK})$ (with normalized cumulative rewards), where $S$ is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of $\widetilde O(\sqrt{\tau^{-1}SAK})$ under a continuity assumption and in general attains a near-optimal regret of $\widetilde O(\tau^{-1}\sqrt{SAK})$, which is minimax-optimal for constant $\tau$. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.
Author Information
Kaiwen Wang (Cornell Tech)
Nathan Kallus (Cornell University)
Wen Sun (Cornell University)
More from the Same Authors
-
2021 : Corruption Robust Offline Reinforcement Learning »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 : Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 : MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2023 : Representation Learning in Low-rank Slate-based Recommender Systems »
Yijia Dai · Wen Sun -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Selective Sampling and Imitation Learning via Online Regression »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 Poster: Smooth Non-stationary Bandits »
Su Jia · Qian Xie · Nathan Kallus · Peter Frazier -
2023 Poster: Multi-task Representation Learning for Pure Exploration in Linear Bandits »
Yihan Du · Longbo Huang · Wen Sun -
2023 Poster: Distributional Offline Policy Evaluation with Predictive Error Guarantees »
Runzhe Wu · Masatoshi Uehara · Wen Sun -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Poster: B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding »
Miruna Oprescu · Jacob Dorn · Marah Ghoummaid · Andrew Jesson · Nathan Kallus · Uri Shalit -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Poster: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Spotlight: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Oral: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2021 Poster: Fairness of Exposure in Stochastic Bandits »
Luke Lequn Wang · Yiwei Bai · Wen Sun · Thorsten Joachims -
2021 Spotlight: Fairness of Exposure in Stochastic Bandits »
Luke Lequn Wang · Yiwei Bai · Wen Sun · Thorsten Joachims -
2021 Poster: Robust Policy Gradient against Strong Data Corruption »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 Poster: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2021 Spotlight: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2021 Spotlight: Robust Policy Gradient against Strong Data Corruption »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2021 Spotlight: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2020 Poster: Statistically Efficient Off-Policy Policy Gradients »
Nathan Kallus · Masatoshi Uehara -
2020 Poster: DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training »
Nathan Kallus -
2020 Poster: Efficient Policy Learning from Surrogate-Loss Classification Reductions »
Andrew Bennett · Nathan Kallus -
2020 Poster: Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation »
Nathan Kallus · Masatoshi Uehara -
2019 Poster: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2019 Oral: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2018 Poster: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2018 Oral: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2017 Poster: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus -
2017 Talk: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus