Timezone: »
Poster
Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou · Lihong Li · Quanquan Gu
Tue Jul 14 09:00 AM -- 09:45 AM & Tue Jul 14 10:00 PM -- 10:45 PM (PDT) @ Virtual
We study the stochastic contextual bandit problem, where the reward is generated from an unknown function with additive noise. No assumption is made about the reward function other than boundedness. We propose a new algorithm, NeuralUCB, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under standard assumptions, NeuralUCB achieves $\tilde O(\sqrt{T})$ regret, where $T$ is the number of rounds. To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee. We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks.
Author Information
Dongruo Zhou (UCLA)
Lihong Li (Google Research)
Quanquan Gu (University of California, Los Angeles)
More from the Same Authors
-
2021 : Benign Overfitting in Adversarially Robust Linear Classification »
Jinghui Chen · Yuan Cao · Yuan Cao · Quanquan Gu -
2021 : Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures »
Yuan Cao · Yuan Cao · Quanquan Gu · Mikhail Belkin -
2021 : Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation »
Yue Wu · Dongruo Zhou · Quanquan Gu -
2021 : Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 : Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 : Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation »
Zixiang Chen · Dongruo Zhou · Quanquan Gu -
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 : Robust Learning with Progressive Data Expansion Against Spurious Correlation »
Yihe Deng · Yu Yang · Baharan Mirzasoleiman · Quanquan Gu -
2023 : DiffMol: 3D Structured Molecule Generation with Discrete Denoising Diffusion Probabilistic Models »
Weitong Zhang · Xiaoyun Wang · Justin Smith · Joe Eaton · Brad Rees · Quanquan Gu -
2023 : Borda Regret Minimization for Generalized Linear Dueling Bandits »
Yue Wu · Tao Jin · Qiwei Di · Hao Lou · Farzad Farnoud · Quanquan Gu -
2023 Poster: DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design »
Jiaqi Guan · Xiangxin Zhou · Yuwei Yang · Yu Bao · Jian Peng · Jianzhu Ma · Qiang Liu · Liang Wang · Quanquan Gu -
2023 Poster: Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes »
Jiafan He · Heyang Zhao · Dongruo Zhou · Quanquan Gu -
2023 Poster: Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Benign Overfitting in Two-layer ReLU Convolutional Neural Networks »
Yiwen Kou · Zixiang Chen · Yuanzhou Chen · Quanquan Gu -
2023 Poster: Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization »
Chris Junchi Li · Huizhuo Yuan · Gauthier Gidel · Quanquan Gu · Michael Jordan -
2023 Oral: Structure-informed Language Models Are Protein Designers »
Zaixiang Zheng · Yifan Deng · Dongyu Xue · Yi Zhou · Fei YE · Quanquan Gu -
2023 Poster: Personalized Federated Learning under Mixture of Distributions »
Yue Wu · Shuaicheng Zhang · Wenchao Yu · Yanchi Liu · Quanquan Gu · Dawei Zhou · Haifeng Chen · Wei Cheng -
2023 Poster: Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits »
Heyang Zhao · Dongruo Zhou · Jiafan He · Quanquan Gu -
2023 Poster: Structure-informed Language Models Are Protein Designers »
Zaixiang Zheng · Yifan Deng · Dongyu Xue · Yi Zhou · Fei YE · Quanquan Gu -
2023 Poster: The Benefits of Mixup for Feature Learning »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2023 Poster: Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs »
Junkai Zhang · Weitong Zhang · Quanquan Gu -
2023 Poster: Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path »
Qiwei Di · Jiafan He · Dongruo Zhou · Quanquan Gu -
2023 Poster: On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits »
Weitong Zhang · Jiafan He · Zhiyuan Fan · Quanquan Gu -
2023 Poster: Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes »
Chenlu Ye · Wei Xiong · Quanquan Gu · Tong Zhang -
2022 Poster: Learning Stochastic Shortest Path with Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2022 Spotlight: Learning Stochastic Shortest Path with Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs »
Yuanzhou Chen · Jiafan He · Quanquan Gu -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Spotlight: On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs »
Yuanzhou Chen · Jiafan He · Quanquan Gu -
2022 Poster: Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization »
Dongruo Zhou · Quanquan Gu -
2022 Spotlight: Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization »
Dongruo Zhou · Quanquan Gu -
2021 : Stochastic Variance-Reduced High-order Optimization for Nonconvex Optimization »
Quanquan Gu -
2021 Workshop: Over-parameterization: Pitfalls and Opportunities »
Yasaman Bahri · Quanquan Gu · Amin Karbasi · Hanie Sedghi -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients »
Difan Zou · Quanquan Gu -
2021 Spotlight: On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients »
Difan Zou · Quanquan Gu -
2021 Poster: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Poster: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits »
Tianyuan Jin · Jing Tang · Pan Xu · Keke Huang · Xiaokui Xiao · Quanquan Gu -
2021 Poster: MOTS: Minimax Optimal Thompson Sampling »
Tianyuan Jin · Pan Xu · Jieming Shi · Xiaokui Xiao · Quanquan Gu -
2021 Poster: Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping »
Dongruo Zhou · Jiafan He · Quanquan Gu -
2021 Poster: Logarithmic Regret for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Spotlight: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits »
Tianyuan Jin · Jing Tang · Pan Xu · Keke Huang · Xiaokui Xiao · Quanquan Gu -
2021 Spotlight: Logarithmic Regret for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Spotlight: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Spotlight: MOTS: Minimax Optimal Thompson Sampling »
Tianyuan Jin · Pan Xu · Jieming Shi · Xiaokui Xiao · Quanquan Gu -
2021 Spotlight: Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping »
Dongruo Zhou · Jiafan He · Quanquan Gu -
2021 Poster: Provable Robustness of Adversarial Training for Learning Halfspaces with Noise »
Difan Zou · Spencer Frei · Quanquan Gu -
2021 Poster: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Poster: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Spotlight: Provable Robustness of Adversarial Training for Learning Halfspaces with Noise »
Difan Zou · Spencer Frei · Quanquan Gu -
2021 Oral: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Spotlight: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation »
Pan Xu · Quanquan Gu -
2020 Poster: Batch Stationary Distribution Estimation »
Junfeng Wen · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Poster: Optimization Theory for ReLU Neural Networks Trained with Normalization Layers »
Yonatan Dukler · Quanquan Gu · Guido Montufar -
2019 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Alborz Geramifard · Lihong Li · Csaba Szepesvari · Tao Wang -
2019 Poster: On the Convergence and Robustness of Adversarial Training »
Yisen Wang · Xingjun Ma · James Bailey · Jinfeng Yi · Bowen Zhou · Quanquan Gu -
2019 Oral: On the Convergence and Robustness of Adversarial Training »
Yisen Wang · Xingjun Ma · James Bailey · Jinfeng Yi · Bowen Zhou · Quanquan Gu -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: Lower Bounds for Smooth Nonconvex Finite-Sum Optimization »
Dongruo Zhou · Quanquan Gu -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Lower Bounds for Smooth Nonconvex Finite-Sum Optimization »
Dongruo Zhou · Quanquan Gu -
2018 Poster: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Poster: Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions »
Pan Xu · Tianhao Wang · Quanquan Gu -
2018 Oral: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions »
Pan Xu · Tianhao Wang · Quanquan Gu -
2018 Poster: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery »
Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: Stochastic Variance-Reduced Hamilton Monte Carlo Methods »
Difan Zou · Pan Xu · Quanquan Gu -
2018 Poster: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Bo Dai · Albert Shaw · Lihong Li · Lin Xiao · Niao He · Zhen Liu · Jianshu Chen · Le Song -
2018 Oral: Stochastic Variance-Reduced Hamilton Monte Carlo Methods »
Difan Zou · Pan Xu · Quanquan Gu -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Oral: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery »
Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu -
2018 Oral: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Bo Dai · Albert Shaw · Lihong Li · Lin Xiao · Niao He · Zhen Liu · Jianshu Chen · Le Song -
2018 Poster: Stochastic Variance-Reduced Cubic Regularized Newton Method »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Poster: Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization »
Jinghui Chen · Pan Xu · Lingxiao Wang · Jian Ma · Quanquan Gu -
2018 Oral: Stochastic Variance-Reduced Cubic Regularized Newton Method »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Oral: Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization »
Jinghui Chen · Pan Xu · Lingxiao Wang · Jian Ma · Quanquan Gu -
2017 Poster: Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference »
Aditya Chaudhry · Pan Xu · Quanquan Gu -
2017 Poster: High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm »
Rongda Zhu · Lingxiao Wang · Chengxiang Zhai · Quanquan Gu -
2017 Poster: Robust Gaussian Graphical Model Estimation with Arbitrary Corruption »
Lingxiao Wang · Quanquan Gu -
2017 Talk: High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm »
Rongda Zhu · Lingxiao Wang · Chengxiang Zhai · Quanquan Gu -
2017 Talk: Robust Gaussian Graphical Model Estimation with Arbitrary Corruption »
Lingxiao Wang · Quanquan Gu -
2017 Talk: Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference »
Aditya Chaudhry · Pan Xu · Quanquan Gu -
2017 Poster: A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery »
Lingxiao Wang · Xiao Zhang · Quanquan Gu -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Provably Optimal Algorithms for Generalized Linear Contextual Bandits »
Lihong Li · Yu Lu · Dengyong Zhou -
2017 Talk: A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery »
Lingxiao Wang · Xiao Zhang · Quanquan Gu -
2017 Talk: Provably Optimal Algorithms for Generalized Linear Contextual Bandits »
Lihong Li · Yu Lu · Dengyong Zhou