Skip to yearly menu bar Skip to main content


(109 events)   Timezone:  
Show all
The 2021 schedule is still incomplete
Toggle Poster Visibility
Sun Jul 25 01:00 AM -- 01:25 AM (KST)
Invited Speaker: Emilie Kaufmann: On pure-exploration in Markov Decision Processes
Emilie Kaufmann
Sun Jul 25 01:30 AM -- 01:55 AM (KST)
Invited Speaker: Christian Kroer: Recent Advances in Iterative Methods for Large-Scale Game Solving
Christian Kroer
Sun Jul 25 02:00 AM -- 02:12 AM (KST)
Sparsity in the Partially Controllable LQR
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang
Sun Jul 25 02:15 AM -- 02:27 AM (KST)
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan
Sun Jul 25 02:30 AM -- 02:42 AM (KST)
Implicit Finite-Horizon Approximation for Stochastic Shortest Path
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo
Sun Jul 25 02:45 AM -- 02:57 AM (KST)
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette · Martin Wainwright · Emma Brunskill
Sun Jul 25 03:00 AM -- 03:25 AM (KST)
Invited Speaker: Animashree Anandkumar: Stability-aware reinforcement learning in dynamical systems
Animashree Anandkumar
Sun Jul 25 03:30 AM -- 03:55 AM (KST)
Invited Speaker: Shie Mannor: Lenient Regret
Shie Mannor
Sun Jul 25 04:00 AM -- 04:30 AM (KST)
Social Session
Sun Jul 25 04:30 AM -- 06:00 AM (KST)
Poster Session - I
Sun Jul 25 06:00 AM -- 06:25 AM (KST)
Invited Speaker: Bo Dai: Leveraging Non-uniformity in Policy Gradient
Bo Dai
Sun Jul 25 06:30 AM -- 06:55 AM (KST)
Invited Speaker: Qiaomin Xie: Reinforcement Learning for Zero-Sum Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
Sun Jul 25 07:00 AM -- 07:12 AM (KST)
Bad-Policy Density: A Measure of Reinforcement-Learning Hardness
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong
Sun Jul 25 07:15 AM -- 07:27 AM (KST)
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong
Sun Jul 25 07:30 AM -- 07:42 AM (KST)
Solving Multi-Arm Bandit Using a Few Bits of Communication
Osama Hanna · Lin Yang · Christina Fragouli
Sun Jul 25 07:45 AM -- 07:57 AM (KST)
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
Tengyu Xu · Yingbin LIANG · Guanghui Lan
Sun Jul 25 08:00 AM -- 08:25 AM (KST)
Invited Speaker: Art Owen: Empirical likelihood for reinforcement learning
Sun Jul 25 08:30 AM -- 09:00 AM (KST)
Panel Session: Animashree Anandkumar, Christian Kroer, Art Owen, Qiaomin Xie
Sun Jul 25 09:00 AM -- 09:30 AM (KST)
Social Session
Sun Jul 25 09:30 AM -- 01:00 PM (KST)
Poster Session - II
Sparsity in the Partially Controllable LQR
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang
Multi-Task Offline Reinforcement Learning with Conservative Data Sharing
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn
Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
Chi Jin · Qinghua Liu · Sobhan Miryoosefi
Bridging The Gap between Local and Joint Differential Privacy in RL
Evrard Garcelon · Vianney Perchet · Ciara Pike-Burke · Matteo Pirotta
Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games
Abhimanyu Dubey · Alex `Sandy' Pentland
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
Chi Jin · Qinghua Liu · Tiancheng Yu
Model-based Offline Reinforcement Learning with Local Misspecification
Kefan Dong · Ramtin Keramati · Emma Brunskill
Reward-Weighted Regression Converges to a Global Optimum
Francesco Faccio · Rupesh Kumar Srivastava · Jürgen Schmidhuber
Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta
Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
Stefanos Leonardos · Will Overman · Ioannis Panageas · Georgios Piliouras
Marginalized Operators for Off-Policy Reinforcement Learning
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko
Online Sub-Sampling for Reinforcement Learning with General Function Approximation
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang
Mixture of Step Returns in Bootstrapped DQN
PoHan Chiang · Hsuan-Kung Yang · Zhang-Wei Hong · Chun-Yi Lee
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
Tengyu Xu · Yingbin LIANG · Guanghui Lan
Provably Efficient Multi-Task Reinforcement Learning with Model Transfer
Chicheng Zhang · Zhi Wang
Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill
Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning
Wanqiao Xu · Kan Xu · Hamsa Bastani · Osbert Bastani
Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case
Gandharv Patil · Prashanth L.A. · Doina Precup
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee
Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine
Non-Stationary Representation Learning in Sequential Multi-Armed Bandits
Qin Yuzhen · Tommaso Menara · Samet Oymak · ShiNung Ching · Fabio Pasqualetti
Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds
Yahya Sattar · Zhe Du · Davoud Ataee Tarzanagh · Necmiye Ozay · Laura Balzano · Samet Oymak
Meta Learning MDPs with linear transition models
Robert Müller · Aldo Pacchiano · Jack Parker-Holder
A Boosting Approach to Reinforcement Learning
Nataly Brukhim · Elad Hazan · Karan Singh
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
Shunshi Zhang · Murat Erdogdu · Animesh Garg
Topological Experience Replay for Fast Q-Learning
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal
A Short Note on the Relationship of Information Gain and Eluder Dimension
Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
Jiafan He · Dongruo Zhou · Quanquan Gu
Robust online control with model misspecification
Xinyi Chen · Udaya Ghai · Elad Hazan · Alexandre Megretsky
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai
A functional mirror ascent view of policy gradient methods with function approximation
Sharan Vaswani · Olivier Bachem · Simone Totaro · Matthieu Geist · Marlos C. Machado · Pablo Samuel Castro · Nicolas Le Roux
Invariant Policy Learning: A Causal Perspective
Sorawit Saengkyongam · Nikolaj Thams · Jonas Peters · Niklas Pfister
Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
Yue Guan · Qifan Zhang · Panagiotis Tsiotras
A Spectral Approach to Off-Policy Evaluation for POMDPs
Yash Nair · Nan Jiang
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric
Provably efficient exploration-free transfer RL for near-deterministic latent dynamics
Yao Liu · Dipendra Misra · Miroslav Dudik · Robert Schapire
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation
Jiafan He · Dongruo Zhou · Quanquan Gu
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
Ming Yin · Yu Bai · Yu-Xiang Wang
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin · Yu-Xiang Wang
Model-Free Approach to Evaluate Reinforcement Learning Algorithms
Denis Belomestny · Ilya Levin · Eric Moulines · Alexey Naumov · Sergey Samsonov · Veronika Zorina
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
Wenshuo Guo · Kumar Agrawal · Aditya Grover · Vidya Muthukumar · Ashwin Pananjady
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Yue Wu · Dongruo Zhou · Quanquan Gu
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
Semih Cayci · Niao He · R Srikant
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
Kefan Dong · Jiaqi Yang · Tengyu Ma
Online Learning for Stochastic Shortest Path Model via Posterior Sampling
Mehdi Jafarnia · Liyu Chen · Rahul Jain · Haipeng Luo
Randomized Least Squares Policy Optimization
Haque Ishfaq · Zhuoran Yang · Andrei Lupu · Viet Nguyen · Lewis Liu · Riashat Islam · Zhaoran Wang · Doina Precup
Statistical Inference with M-Estimators on Adaptively Collected Data
Kelly Zhang · Lucas Janson · Susan Murphy
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
Dibya Ghosh · Jad Rahme · Aviral Kumar · Amy Zhang · Ryan P. Adams · Sergey Levine
Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
Aviv Rosenberg · Yishay Mansour
Optimal and instance-dependent oracle inequalities for policy evaluation
Wenlong Mou · Ashwin Pananjady · Martin Wainwright
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li
Learning to Observe with Reinforcement Learning
Mehmet Koseoglu · Ece Kunduracioglu · Ayca Ozcelikkale
The Importance of Non-Markovianity in Maximum State Entropy Exploration
Mirco Mutti · Riccardo De Santi · Marcello Restelli
Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
Zaiwei Chen · sajad khodadadian · Siva Maguluri
When Is Generalizable Reinforcement Learning Tractable?
Dhruv Malik · Yuanzhi Li · Pradeep Ravikumar
Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
Kaiqing Zhang · Xiangyuan Zhang · Bin Hu · Tamer Basar
Nonstationary Reinforcement Learning with Linear Function Approximation
Huozhi Zhou · Jinglin Chen · Lav Varshney · Ashish Jagmohan
On Overconservatism in Offline Reinforcement Learning
Karush Suri · Florian Shkurti
Collision Resolution in Multi-player Bandits Without Observing Collision Information
Eleni Nisioti · Nikolaos Thomos · Boris Bellalta · Anders Jonsson
Subgaussian Importance Sampling for Off-Policy Evaluation and Learning
Alberto Maria Metelli · Alessio Russo · Marcello Restelli
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
Tang Thanh Nguyen · Sunil Gupta · Hung Tran-The · Svetha Venkatesh
Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
Honghao Wei · Xin Liu · Lei Ying
Minimax Regret for Stochastic Shortest Path
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg
Finding the Near Optimal Policy via Reductive Regularization in MDPs
Wenhao Yang · Xiang Li · Guangzeng Xie · Zhihua Zhang
Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning
Sheng Zhang · Zhe Zhang · Siva Maguluri
A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs
Andrea Tirinzoni · Matteo Pirotta · Alessandro Lazaric
Refined Policy Improvement Bounds for MDPs
Mark Gluzman
Efficient Inverse Reinforcement Learning of Transferable Rewards
Giorgia Ramponi · Alberto Maria Metelli · Marcello Restelli
On the Sample Complexity of Average-reward MDPs
Yujia Jin
Learning Stackelberg Equilibria in Sequential Price Mechanisms
Gianluca Brero
Improved Estimator Selection for Off-Policy Evaluation
George Tucker
Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
Sarah Rathnam
Bagged Critic for Continuous Control
Payal Bawa
A general sample complexity analysis of vanilla policy gradient
Rui Yuan · Robert Gower · Alessandro Lazaric
Is Pessimism Provably Efficient for Offline RL?
Ying Jin · Zhuoran Yang · Zhaoran Wang
Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation
Zixiang Chen · Dongruo Zhou · Quanquan Gu
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin · Longbo Huang · Haipeng Luo
Decentralized Q-Learning in Zero-sum Markov Games
Kaiqing Zhang · David Leslie · Tamer Basar · Asuman Ozdaglar
Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
Zaiwei Chen · Siva Maguluri · Sanjay Shakkottai · Karthikeyan Shanmugam
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan
Implicit Finite-Horizon Approximation for Stochastic Shortest Path
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo
Bad-Policy Density: A Measure of Reinforcement-Learning Hardness
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong
Gap-Dependent Unsupervised Exploration for Reinforcement Learning
Jingfeng Wu · Vladimir Braverman · Lin Yang
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong
Solving Multi-Arm Bandit Using a Few Bits of Communication
Osama Hanna · Lin Yang · Christina Fragouli
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette · Martin Wainwright · Emma Brunskill