Timezone: »
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOlin), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our bounds imply novel results on several concrete settings such as linear bandit with finite model class or sparse models, and two-layer neural net bandit. A key algorithmic insight is that optimism may lead to overexploration even for one-layer neural net model class. On the other hand, for convergence to local maxima, it suffices to maximize the virtual return if the model can also predict the size of the gradient and Hessian of the return.
Author Information
Kefan Dong (Tsinghua University)
Jiaqi Yang (Tsinghua University)
Tengyu Ma (Stanford University)
More from the Same Authors
-
2021 : Model-based Offline Reinforcement Learning with Local Misspecification »
Kefan Dong · Ramtin Keramati · Emma Brunskill -
2023 : Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training »
Hong Liu · Zhiyuan Li · David Hall · Percy Liang · Tengyu Ma -
2023 Oral: Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models »
Hong Liu · Sang Michael Xie · Zhiyuan Li · Tengyu Ma -
2023 Poster: Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models »
Hong Liu · Sang Michael Xie · Zhiyuan Li · Tengyu Ma -
2022 Poster: Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation »
Kendrick Shen · Robbie Jones · Ananya Kumar · Sang Michael Xie · Jeff Z. HaoChen · Tengyu Ma · Percy Liang -
2022 Oral: Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation »
Kendrick Shen · Robbie Jones · Ananya Kumar · Sang Michael Xie · Jeff Z. HaoChen · Tengyu Ma · Percy Liang -
2021 Poster: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization »
Sang Michael Xie · Tengyu Ma · Percy Liang -
2021 Oral: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization »
Sang Michael Xie · Tengyu Ma · Percy Liang -
2020 Poster: On the Expressivity of Neural Networks for Deep Reinforcement Learning »
Kefan Dong · Yuping Luo · Tianhe (Kevin) Yu · Chelsea Finn · Tengyu Ma