ICML 2021 Schedule

( events) Timezone:

The 2021 schedule is still incomplete

Workshop

Sat Jul 24 09:00 AM -- 04:25 PM (PDT)

Workshop on Reinforcement Learning Theory

Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang

Workshop Home Page

While over many years we have witnessed numerous impressive demonstrations of the power of various reinforcement learning (RL) algorithms, and while much progress was made on the theoretical side as well, the theoretical understanding of the challenges that underlie RL is still rather limited. The best-studied problem settings, such as learning and acting in finite state-action Markov decision processes, or simple linear control systems fail to capture the essential characteristics of seemingly more practically relevant problem classes, where the size of the state-action space is often astronomical, the planning horizon is huge, the dynamics is complex, interaction with the controlled system is not permitted, or learning has to happen based on heterogeneous offline data, etc. To tackle these diverse issues, more and more theoreticians with a wide range of backgrounds came to study RL and have proposed numerous new models along with exciting novel developments on both algorithm design and analysis. The workshop's goal is to highlight advances in theoretical RL and bring together researchers from different backgrounds to discuss RL theory from different perspectives: modeling, algorithm, analysis, etc.


	Invited Speaker: Emilie Kaufmann: On pure-exploration in Markov Decision Processes (Presentation)


	Invited Speaker: Christian Kroer: Recent Advances in Iterative Methods for Large-Scale Game Solving (Presentation)


	Sparsity in the Partially Controllable LQR (Poster & Spotlight Talk)


	On the Theory of Reinforcement Learning with Once-per-Episode Feedback (Poster & Spotlight Talk)


	Implicit Finite-Horizon Approximation for Stochastic Shortest Path (Poster & Spotlight Talk)


	Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning (Poster & Spotlight Talk)


	Invited Speaker: Animashree Anandkumar: Stability-aware reinforcement learning in dynamical systems (Presentation)


	Invited Speaker: Shie Mannor: Lenient Regret (Presentation)


	Social Session (Discussion & Chat)


	Poster Session - I (Poster Session)


	Invited Speaker: Bo Dai: Leveraging Non-uniformity in Policy Gradient (Presentation)


	Invited Speaker: Qiaomin Xie: Reinforcement Learning for Zero-Sum Markov Games Using Function Approximation and Correlated Equilibrium (Presentation)


	Bad-Policy Density: A Measure of Reinforcement-Learning Hardness (Poster & Spotlight Talk)


	Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games (Poster & Spotlight Talk)


	Solving Multi-Arm Bandit Using a Few Bits of Communication (Poster & Spotlight Talk)


	CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee (Poster & Spotlight Talk)


	Invited Speaker: Art Owen: Empirical likelihood for reinforcement learning (Presentation)


	Panel Session: Animashree Anandkumar, Christian Kroer, Art Owen, Qiaomin Xie (Discussion Panel)


	Social Session (Discussion & Chat)


	Poster Session - II (Poster Session)


	Learning Stackelberg Equilibria in Sequential Price Mechanisms (Poster)


	A Boosting Approach to Reinforcement Learning (Poster)


	Improved Estimator Selection for Off-Policy Evaluation (Poster)


	Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation (Poster)


	Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings (Poster)


	A Short Note on the Relationship of Information Gain and Eluder Dimension (Poster)


	Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity (Poster)


	Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms (Poster)


	The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces (Poster)


	A general sample complexity analysis of vanilla policy gradient (Poster)


	Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs (Poster)


	Topological Experience Replay for Fast Q-Learning (Poster)


	Is Pessimism Provably Efficient for Offline RL? (Poster)


	Online Sub-Sampling for Reinforcement Learning with General Function Approximation (Poster)


	Robust online control with model misspecification (Poster)


	Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning (Poster)


	A functional mirror ascent view of policy gradient methods with function approximation (Poster)


	Invariant Policy Learning: A Causal Perspective (Poster)


	Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation (Poster)


	Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning (Poster)


	A Spectral Approach to Off-Policy Evaluation for POMDPs (Poster)


	Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret (Poster)


	Provably efficient exploration-free transfer RL for near-deterministic latent dynamics (Poster)


	Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation (Poster)


	Mixture of Step Returns in Bootstrapped DQN (Poster)


	Near-Optimal Offline Reinforcement Learning via Double Variance Reduction (Poster)


	Bridging The Gap between Local and Joint Differential Privacy in RL (Poster)


	Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings (Poster)


	Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games (Poster)


	Provable RL with Exogenous Distractors via Multistep Inverse Dynamics (Poster)


	Model-Free Approach to Evaluate Reinforcement Learning Algorithms (Poster)


	Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits (Poster)


	Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation (Poster)


	Model-based Offline Reinforcement Learning with Local Misspecification (Poster)


	Decentralized Q-Learning in Zero-sum Markov Games (Poster)


	Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation (Poster)


	Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature (Poster)


	Online Learning for Stochastic Shortest Path Model via Posterior Sampling (Poster)


	Gap-Dependent Unsupervised Exploration for Reinforcement Learning (Poster)


	Randomized Least Squares Policy Optimization (Poster)


	Statistical Inference with M-Estimators on Adaptively Collected Data (Poster)


	Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability (Poster)


	Learning Adversarial Markov Decision Processes with Delayed Feedback (Poster)


	Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure (Poster)


	Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning (Poster)


	Reward-Weighted Regression Converges to a Global Optimum (Poster)


	Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning (Poster)


	Optimal and instance-dependent oracle inequalities for policy evaluation (Poster)


	A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs (Poster)


	Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection (Poster)


	Bagged Critic for Continuous Control (Poster)


	Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity (Poster)


	Learning to Observe with Reinforcement Learning (Poster)


	Efficient Inverse Reinforcement Learning of Transferable Rewards (Poster)


	Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games (Poster)


	The Importance of Non-Markovianity in Maximum State Entropy Exploration (Poster)


	Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation (Poster)


	When Is Generalizable Reinforcement Learning Tractable? (Poster)


	Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity (Poster)


	Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators (Poster)


	Nonstationary Reinforcement Learning with Linear Function Approximation (Poster)


	On Overconservatism in Offline Reinforcement Learning (Poster)


	Marginalized Operators for Off-Policy Reinforcement Learning (Poster)


	Collision Resolution in Multi-player Bandits Without Observing Collision Information (Poster)


	Subgaussian Importance Sampling for Off-Policy Evaluation and Learning (Poster)


	Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation (Poster)


	Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks (Poster)


	Finite Sample Analysis of Average-Reward TD Learning and $Q$ -Learning (Poster)


	Finding the Near Optimal Policy via Reductive Regularization in MDPs (Poster)


	Minimax Regret for Stochastic Shortest Path (Poster)


	Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games (Poster)


	On the Theory of Reinforcement Learning with Once-per-Episode Feedback (Poster)


	Implicit Finite-Horizon Approximation for Stochastic Shortest Path (Poster)


	Solving Multi-Arm Bandit Using a Few Bits of Communication (Poster)


	Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning (Poster)


	Sparsity in the Partially Controllable LQR (Poster)


	CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee (Poster)


	Bad-Policy Density: A Measure of Reinforcement-Learning Hardness (Poster)


	Provably Efficient Multi-Task Reinforcement Learning with Model Transfer (Poster)


	Multi-Task Offline Reinforcement Learning with Conservative Data Sharing (Poster)


	Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case (Poster)


	On the Sample Complexity of Average-reward MDPs (Poster)


	Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses (Poster)


	Value-Based Deep Reinforcement Learning Requires Explicit Regularization (Poster)


	Non-Stationary Representation Learning in Sequential Multi-Armed Bandits (Poster)


	Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds (Poster)


	The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition (Poster)


	Meta Learning MDPs with linear transition models (Poster)


	Refined Policy Improvement Bounds for MDPs (Poster)