Timezone: »
In many settings such as education, healthcare, drug design, robotics, transportation, and achieving better-than-human performance in strategic games, it is important to make decisions sequentially. This poses two interconnected algorithmic and statistical challenges: effectively exploring to learn information about the underlying dynamics and effectively planning using this information. Reinforcement Learning (RL) is the main paradigm tackling both of these challenges simultaneously which is essential in the aforementioned applications. Over the last years, reinforcement learning has seen enormous progress both in solidifying our understanding on its theoretical underpinnings and in applying these methods in practice.
This workshop aims to highlight recent theoretical contributions, with an emphasis on addressing significant challenges on the road ahead. Such theoretical understanding is important in order to design algorithms that have robust and compelling performance in real-world applications. As part of the ICML 2020 conference, this workshop will be held virtually. It will feature keynote talks from six reinforcement learning experts tackling different significant facets of RL. It will also offer the opportunity for contributed material (see below the call for papers and our outstanding program committee). The authors of each accepted paper will prerecord a 10-minute presentation and will also appear in a poster session. Finally, the workshop will have a panel discussing important challenges in the road ahead.
Fri 6:30 a.m. - 7:15 a.m.
|
Exploration, Policy Gradient Methods, and the Deadly Triad - Sham Kakade
(
Talk
)
Practical reinforcement learning algorithms often face the "deadly triad" [Rich Sutton, 2015]: function approximation, data efficiency (e.g. by bootstrapping value function estimates), and exploration (e.g. by off-policy learning). Algorithms which address two without the third are often ok, while trying to address all three leads to highly unstable algorithms in practice. This talk considers a policy gradient approach to alleviate these issues. In particular, we introduce the Policy Cover-Policy Gradient (PC-PG) algorithm, which provably balances the exploration vs. exploitation tradeoff, with polynomial sample complexity, using an ensemble of learned policies (the policy cover). We quantify how the relevant notion of function approximation is based on an approximation error term, under distribution shift. Furthermore, we will provide simple examples where a number of standard (and provable) RL approaches are less robust when it comes to function approximation. Time permitting, we will discuss the implications this has for more effective data re-use. Joint work with: Alekh Agarwal, Mikael Henaff, and Wen Sun. |
Sham Kakade 🔗 |
Fri 7:20 a.m. - 8:05 a.m.
|
A Unifying View of Optimism in Episodic Reinforcement Learning - Gergely Neu
(
Talk
)
The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods. |
Gergely Neu 🔗 |
Fri 8:10 a.m. - 9:25 a.m.
|
Poster Session 1
(
Poster Session
)
|
🔗 |
Fri 9:30 a.m. - 10:25 a.m.
|
Speaker Panel
(
Panel
)
|
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy 🔗 |
Fri 10:30 a.m. - 11:15 a.m.
|
An Off-policy Policy Gradient Theorem: A Tale About Weightings - Martha White
(
Talk
)
SlidesLive Video » The goal of the talk is to discuss our recent work on an off-policy policy gradient theorem, and how it can help leverage theory for the on-policy setting for use in the off-policy setting. The key insight is to provide a more general objective for the off-policy setting, that encompasses the on-policy episodic objective. These simple generalizations make it straightforward to port and generalize ideas. |
Martha White 🔗 |
Fri 11:20 a.m. - 11:35 a.m.
|
Short Talk 1 - Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality
(
Talk
)
We study regret minimization in stochastic structured bandits. The fact that the popular optimistic algorithms do not achieve the asymptotic instance-dependent regret optimality has recently allured researchers. On the other hand, it is known that one can achieve a bounded regret (i.e., does not grow indefinitely with ) in certain instances. Unfortunately, existing asymptotically optimal algorithms rely on forced sampling that introduces an term w.r.t. the time horizon in their regret, failing to adapt to the ``easiness'' of the instance. In this paper, we focus on the finite hypothesis class case and ask if one can achieve the asymptotic optimality while enjoying bounded regret whenever possible. We provide a positive answer via a new algorithm called CRush Optimism with Pessimism (CROP). Our analysis shows that CROP achieves a constant-factor asymptotic optimality and, thanks to the forced-exploration-free design, adapts to bounded regret, and its regret bound scales not with the number of arms but with an effective number of arms that we introduce. We also show that CROP can be exponentially better than existing algorithms in the \textit{nonasymptotic} regimes. Finally, we observe that even a clairvoyant oracle who plays according to the asymptotically optimal arm pull scheme may suffer a linear worst-case regret, indicating that it may not be the end of optimism. We believe our work may inspire a new family of algorithms for bandits and reinforcement learning. Kwang-Sung Jun, Chicheng Zhang |
Kwang-Sung Jun 🔗 |
Fri 11:35 a.m. - 11:50 a.m.
|
Short Talk 2 - Adaptive Discretization for Model-Based Reinforcement Learning
(
Talk
)
We introduce the technique of adaptive discretization to design efficient model-based episodic reinforcement learning algorithms in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space. From a theoretical perspective, we provide worst-case regret bounds for our algorithm, which are competitive compared to the state-of-the-art RL algorithms; moreover, our bounds are obtained via a modular proof technique, which can potentially extend to incorporate additional structure on the problem. Our algorithm has much lower storage and computational requirements, due to maintaining a more efficient partition of the state and action spaces. We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage. Sean R. Sinclair, Tianyu Wang, Gauri Jain, Sid Banerjee, Christina Yu |
Sean R. Sinclair 🔗 |
Fri 11:50 a.m. - 12:05 p.m.
|
Short Talk 3 - A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces
(
Talk
)
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. Omar Darwiche Domingues, Pierre MENARD, Matteo Pirotta, Emilie Kaufmann, Michal Valko |
Omar Darwiche Domingues 🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
Short Talk 4 - Adaptive Regret for Online Control
(
Talk
)
We consider regret minimization for online control with time-varying linear dynamical systems. The metric of performance we study is adaptive policy regret, or regret compared to the best policy on {\it any interval in time}. We give an efficient algorithm that attains first-order adaptive regret guarantees for the setting of online convex optimization with memory, subsequently used to derive a controller with such guarantees. We show that these bounds are nearly tight and validate these theoretical findings experimentally on simulations of time-varying dynamics and disturbances. Paula Gradu, Elad Hazan, Edgar Minasyan |
Edgar Minasyan 🔗 |
Fri 12:20 p.m. - 12:35 p.m.
|
Short Talk 5 - Near-Optimal Reinforcement Learning with Self-Play
(
Talk
)
This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with states, max-player actions and min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires steps of game playing, when only highlighting the dependency on . In contrast, the best existing lower bound scales as and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity , and a new \emph{Nash V-learning} algorithm with sample complexity . The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. We complement our upper bounds with a computational hardness result for achieving sublinear regret when playing against adversarial opponents in Markov games. Yu Bai, Chi Jin, Tiancheng Yu |
Tiancheng Yu 🔗 |
Fri 12:35 p.m. - 12:50 p.m.
|
Short Talk 6 - Preference learning along multiple criteria: A game-theoretic perspective
(
Talk
)
The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well-known that any Nash equilibrium of the zero-sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, however, are inevitably multi-criteria, with different pairwise preferences governing the different criteria. In this work, we generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell’s approachability. Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization. From a theoretical standpoint, we show that the Blackwell winner of a multi-criteria problem instance can be computed as the solution to a convex optimization problem. Furthermore, given random samples of pairwise comparisons, we show that a simple, "plug-in" estimator achieves (near-)optimal minimax sample complexity. Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences. Kush Bhatia, Ashwin Pananjady, Peter Bartlett, Anca Dragan, Martin Wainwright |
Kush Bhatia 🔗 |
Fri 1:00 p.m. - 2:15 p.m.
|
Poster Session 2
(
Poster Session
)
|
🔗 |
Fri 2:20 p.m. - 3:05 p.m.
|
Representation learning and exploration in reinforcement learning - Akshay Krishnamurthy
(
Talk
)
SlidesLive Video » I will discuss new provably efficient algorithms for reinforcement in rich observation environments with arbitrarly large state spaces. Both algorithms operate by learning succinct representations of the environment, which they use in an exploration module to acquire new information. The first algorithm, called Homer, operates in a block MDP model and uses a contrastive learning objective to learn the representation. On the other hand, the second algorithm, called FLAMBE, operates in a much richer class of low rank MDPs and is model based. Both algorithms accommodate nonlinear function approximation and enjoy provable sample and computational efficiency guarantees. |
Akshay Krishnamurthy 🔗 |
Fri 3:10 p.m. - 3:55 p.m.
|
Learning to price under the Bass model for dynamic demand - Shipra Agrawal
(
Talk
)
We consider a novel formulation of the dynamic pricing and demand learning problem, where the evolution of demand in response to posted prices is governed by a stochastic variant of the popular Bass model with parameters (α, β) that are linked to the so-called "innovation" and "imitation" effects. Unlike the more commonly used i.i.d. demand models, in this model the price posted not only effects the demand and the revenue in the current round but also the evolution of demand, and hence the fraction of market potential that can be captured, in future rounds. Finding a revenue-maximizing dynamic pricing policy in this model is non-trivial even when model parameters are known, and requires solving for optimal non-stationary policy of a continuous-time, continuous-state MDP. In this paper, we consider a more challenging problem where dynamic pricing is used in conjunction with learning the model parameters, with the objective of optimizing the cumulative revenues over a given selling horizon. Our main contribution is an algorithm with a regret guarantee of O (m^2/3), where m is mnemonic for the (known) market size, along with a matching lower bound. |
Shipra Agrawal 🔗 |
Fri 4:00 p.m. - 4:45 p.m.
|
Efficient Planning in Large MDPs with Weak Linear Function Approximation - Csaba Szepesvari
(
Talk
)
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon. I will discuss how this is achieved, some selected part of the vast related literature and what remains open. Joint work with Roshan Shariff |
Csaba Szepesvari 🔗 |
Author Information
Emma Brunskill (Stanford University)

Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).
Thodoris Lykouris (Microsoft Research NYC)
Max Simchowitz (UC Berkeley)
Wen Sun (Cornell University)
Mengdi Wang (Princeton University)
More from the Same Authors
-
2021 : Model-based Offline Reinforcement Learning with Local Misspecification »
Kefan Dong · Ramtin Keramati · Emma Brunskill -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Avoiding Overfitting to the Importance Weights in Offline Policy Optimization »
Yao Liu · Emma Brunskill -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 : Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 : Scaling In-Context Demonstrations with Structured Attention »
Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen -
2023 : Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight »
Jiacheng Guo · Minshuo Chen · Huan Wang · Caiming Xiong · Mengdi Wang · Yu Bai -
2023 : Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Mengdi Wang -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : Visual Adversarial Examples Jailbreak Aligned Large Language Models »
Xiangyu Qi · Kaixuan Huang · Ashwinee Panda · Mengdi Wang · Prateek Mittal -
2023 Poster: Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data »
Minshuo Chen · Kaixuan Huang · Tuo Zhao · Mengdi Wang -
2023 Poster: STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Mengdi Wang · Furong Huang · Dinesh Manocha -
2023 Poster: Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP »
Jiacheng Guo · Zihao Li · Huazheng Wang · Mengdi Wang · Zhuoran Yang · Xuezhou Zhang -
2023 Poster: Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories »
Zixuan Zhang · Minshuo Chen · Mengdi Wang · Wenjing Liao · Tuo Zhao -
2023 Panel: ICML Education Outreach Panel »
Andreas Krause · Barbara Engelhardt · Emma Brunskill · Kyunghyun Cho -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Policy Gradient: Theory for Making Best Use of It »
Mengdi Wang -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Spotlight: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 : Invited Talk: Emma Brunskill »
Emma Brunskill -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Oral: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2020 : QA for invited talk 7 Wang »
Mengdi Wang -
2020 : Invited talk 7 Wang »
Mengdi Wang -
2020 Poster: Naive Exploration is Optimal for Online LQR »
Max Simchowitz · Dylan Foster -
2020 Poster: Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound »
Lin Yang · Mengdi Wang -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Reward-Free Exploration for Reinforcement Learning »
Chi Jin · Akshay Krishnamurthy · Max Simchowitz · Tiancheng Yu -
2020 Poster: Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation »
Yaqi Duan · Zeyu Jia · Mengdi Wang -
2020 Poster: Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions »
Omer Gottesman · Joseph Futoma · Yao Liu · Sonali Parbhoo · Leo Celi · Emma Brunskill · Finale Doshi-Velez -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2020 Poster: Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning »
Esther Rolf · Max Simchowitz · Sarah Dean · Lydia T. Liu · Daniel Bjorkegren · Moritz Hardt · Joshua Blumenstock -
2020 Poster: Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling »
Yao Liu · Pierre-Luc Bacon · Emma Brunskill -
2020 Poster: Logarithmic Regret for Adversarial Online Control »
Dylan Foster · Max Simchowitz -
2019 Workshop: Exploration in Reinforcement Learning Workshop »
Benjamin Eysenbach · Benjamin Eysenbach · Surya Bhupatiraju · Shixiang Gu · Harrison Edwards · Martha White · Pierre-Yves Oudeyer · Kenneth Stanley · Emma Brunskill -
2019 : Emma Brunskill (Stanford) - Minimizing & Understanding the Data Needed to Learn to Make Good Sequences of Decisions »
Emma Brunskill -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 Poster: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Poster: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Oral: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Oral: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Poster: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2019 Poster: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: The Implicit Fairness Criterion of Unconstrained Learning »
Lydia T. Liu · Max Simchowitz · Moritz Hardt -
2019 Oral: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Oral: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2018 Poster: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Oral: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Poster: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill -
2018 Oral: Delayed Impact of Fair Machine Learning »
Lydia T. Liu · Sarah Dean · Esther Rolf · Max Simchowitz · Moritz Hardt -
2018 Oral: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill -
2017 Poster: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin -
2017 Talk: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin