While over many years we have witnessed numerous impressive demonstrations of the power of various reinforcement learning (RL) algorithms, and while much progress was made on the theoretical side as well, the theoretical understanding of the challenges that underlie RL is still rather limited. The best-studied problem settings, such as learning and acting in finite state-action Markov decision processes, or simple linear control systems fail to capture the essential characteristics of seemingly more practically relevant problem classes, where the size of the state-action space is often astronomical, the planning horizon is huge, the dynamics is complex, interaction with the controlled system is not permitted, or learning has to happen based on heterogeneous offline data, etc. To tackle these diverse issues, more and more theoreticians with a wide range of backgrounds came to study RL and have proposed numerous new models along with exciting novel developments on both algorithm design and analysis. The workshop's goal is to highlight advances in theoretical RL and bring together researchers from different backgrounds to discuss RL theory from different perspectives: modeling, algorithm, analysis, etc.
Invited Speaker: Emilie Kaufmann: On pure-exploration in Markov Decision Processes (Presentation) | |
Invited Speaker: Christian Kroer: Recent Advances in Iterative Methods for Large-Scale Game Solving (Presentation) | |
Sparsity in the Partially Controllable LQR (Poster & Spotlight Talk) | |
On the Theory of Reinforcement Learning with Once-per-Episode Feedback (Poster & Spotlight Talk) | |
Implicit Finite-Horizon Approximation for Stochastic Shortest Path (Poster & Spotlight Talk) | |
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning (Poster & Spotlight Talk) | |
Invited Speaker: Animashree Anandkumar: Stability-aware reinforcement learning in dynamical systems (Presentation) | |
Invited Speaker: Shie Mannor: Lenient Regret (Presentation) | |
Social Session (Discussion & Chat) | |
Poster Session - I (Poster Session) | |
Invited Speaker: Bo Dai: Leveraging Non-uniformity in Policy Gradient (Presentation) | |
Invited Speaker: Qiaomin Xie: Reinforcement Learning for Zero-Sum Markov Games Using Function Approximation and Correlated Equilibrium (Presentation) | |
Bad-Policy Density: A Measure of Reinforcement-Learning Hardness (Poster & Spotlight Talk) | |
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games (Poster & Spotlight Talk) | |
Solving Multi-Arm Bandit Using a Few Bits of Communication (Poster & Spotlight Talk) | |
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee (Poster & Spotlight Talk) | |
Invited Speaker: Art Owen: Empirical likelihood for reinforcement learning (Presentation) | |
Panel Session: Animashree Anandkumar, Christian Kroer, Art Owen, Qiaomin Xie (Discussion Panel) | |
Social Session (Discussion & Chat) | |
Poster Session - II (Poster Session) | |
Online Sub-Sampling for Reinforcement Learning with General Function Approximation (Poster) | |
Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation (Poster) | |
Mixture of Step Returns in Bootstrapped DQN (Poster) | |
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee (Poster) | |
Provably Efficient Multi-Task Reinforcement Learning with Model Transfer (Poster) | |
Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case (Poster) | |
On the Sample Complexity of Average-reward MDPs (Poster) | |
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses (Poster) | |
Value-Based Deep Reinforcement Learning Requires Explicit Regularization (Poster) | |
Non-Stationary Representation Learning in Sequential Multi-Armed Bandits (Poster) | |
Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds (Poster) | |
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition (Poster) | |
Meta Learning MDPs with linear transition models (Poster) | |
Refined Policy Improvement Bounds for MDPs (Poster) | |
Learning Stackelberg Equilibria in Sequential Price Mechanisms (Poster) | |
A Boosting Approach to Reinforcement Learning (Poster) | |
Improved Estimator Selection for Off-Policy Evaluation (Poster) | |
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings (Poster) | |
Topological Experience Replay for Fast Q-Learning (Poster) | |
A Short Note on the Relationship of Information Gain and Eluder Dimension (Poster) | |
Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity (Poster) | |
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs (Poster) | |
Robust online control with model misspecification (Poster) | |
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning (Poster) | |
A functional mirror ascent view of policy gradient methods with function approximation (Poster) | |
Invariant Policy Learning: A Causal Perspective (Poster) | |
Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation (Poster) | |
Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning (Poster) | |
A Spectral Approach to Off-Policy Evaluation for POMDPs (Poster) | |
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret (Poster) | |
Provably efficient exploration-free transfer RL for near-deterministic latent dynamics (Poster) | |
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation (Poster) | |
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction (Poster) | |
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings (Poster) | |
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics (Poster) | |
Model-Free Approach to Evaluate Reinforcement Learning Algorithms (Poster) | |
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits (Poster) | |
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation (Poster) | |
Decentralized Q-Learning in Zero-sum Markov Games (Poster) | |
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation (Poster) | |
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature (Poster) | |
Online Learning for Stochastic Shortest Path Model via Posterior Sampling (Poster) | |
Randomized Least Squares Policy Optimization (Poster) | |
Statistical Inference with M-Estimators on Adaptively Collected Data (Poster) | |
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability (Poster) | |
Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure (Poster) | |
Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning (Poster) | |
Optimal and instance-dependent oracle inequalities for policy evaluation (Poster) | |
Bagged Critic for Continuous Control (Poster) | |
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity (Poster) | |
Learning to Observe with Reinforcement Learning (Poster) | |
The Importance of Non-Markovianity in Maximum State Entropy Exploration (Poster) | |
Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation (Poster) | |
When Is Generalizable Reinforcement Learning Tractable? (Poster) | |
Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity (Poster) | |
Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators (Poster) | |
Nonstationary Reinforcement Learning with Linear Function Approximation (Poster) | |
On Overconservatism in Offline Reinforcement Learning (Poster) | |
Collision Resolution in Multi-player Bandits Without Observing Collision Information (Poster) | |
Subgaussian Importance Sampling for Off-Policy Evaluation and Learning (Poster) | |
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks (Poster) | |
Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation (Poster) | |
Minimax Regret for Stochastic Shortest Path (Poster) | |
Finding the Near Optimal Policy via Reductive Regularization in MDPs (Poster) | |
Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning (Poster) | |
Learning Adversarial Markov Decision Processes with Delayed Feedback (Poster) | |
A general sample complexity analysis of vanilla policy gradient (Poster) | |
A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs (Poster) | |
Is Pessimism Provably Efficient for Offline RL? (Poster) | |
Efficient Inverse Reinforcement Learning of Transferable Rewards (Poster) | |
On the Theory of Reinforcement Learning with Once-per-Episode Feedback (Poster) | |
Implicit Finite-Horizon Approximation for Stochastic Shortest Path (Poster) | |
Bad-Policy Density: A Measure of Reinforcement-Learning Hardness (Poster) | |
Gap-Dependent Unsupervised Exploration for Reinforcement Learning (Poster) | |
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games (Poster) | |
Solving Multi-Arm Bandit Using a Few Bits of Communication (Poster) | |
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning (Poster) | |
Sparsity in the Partially Controllable LQR (Poster) | |
Multi-Task Offline Reinforcement Learning with Conservative Data Sharing (Poster) | |
Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms (Poster) | |
Bridging The Gap between Local and Joint Differential Privacy in RL (Poster) | |
Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games (Poster) | |
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces (Poster) | |
Model-based Offline Reinforcement Learning with Local Misspecification (Poster) | |
Reward-Weighted Regression Converges to a Global Optimum (Poster) | |
Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning (Poster) | |
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection (Poster) | |
Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games (Poster) | |
Marginalized Operators for Off-Policy Reinforcement Learning (Poster) | |