50 Results

Poster
Tue 7:00 Asynchronous Coagent Networks
James Kostas, Chris Nota, Philip Thomas
Poster
Tue 7:00 From Importance Sampling to Doubly Robust Policy Gradient
Jiawei Huang, Nan Jiang
Poster
Tue 7:00 Provably Efficient Exploration in Policy Optimization
Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang
Poster
Tue 8:00 GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Shangtong Zhang, Bo Liu, Shimon Whiteson
Poster
Tue 8:00 Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition
Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu
Poster
Tue 8:00 Discount Factor as a Regularizer in Reinforcement Learning
Ron Amit, Ron Meir, Kamil Ciosek
Poster
Tue 8:00 Representations for Stable Off-Policy Reinforcement Learning
Dibya Ghosh, Marc Bellemare
Poster
Tue 8:00 Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards
Umer Siddique, Paul Weng, Matthieu Zimmer
Poster
Tue 9:00 Structured Policy Iteration for Linear Quadratic Regulator
Youngsuk Park, Ryan Rossi, Zheng Wen, Gang Wu, Handong Zhao
Poster
Tue 9:00 Tightening Exploration in Upper Confidence Reinforcement Learning
Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi
Poster
Tue 10:00 Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning
Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar
Poster
Tue 10:00 Global Concavity and Optimization in a Class of Dynamic Discrete Choice Models
Yiding Feng, Ekaterina Khmelnitskaya, Denis Nekipelov
Poster
Tue 10:00 Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
Amin Rakhsha, Goran Radanovic, Rati Devidze, Jerry Zhu, Adish Singla
Poster
Tue 10:00 Logarithmic Regret for Adversarial Online Control
Dylan Foster, Max Simchowitz
Poster
Tue 12:00 Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Yao Liu, Pierre-Luc Bacon, Emma Brunskill
Poster
Tue 13:00 Learning with Good Feature Representations in Bandits and in RL with a Generative Model
Tor Lattimore, Csaba Szepesvari, Gellért Weisz
Poster
Tue 14:00 Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
Asaf Cassel, Alon Cohen, Tomer Koren
Poster
Tue 15:00 Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Nathan Kallus, Masatoshi Uehara
Poster
Tue 15:00 Thompson Sampling Algorithms for Mean-Variance Bandits
Qiuyu Zhu, Vincent Tan
Poster
Tue 15:00 Invariant Causal Prediction for Block MDPs
Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup
Poster
Wed 5:00 Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting
Zixin Zhong, Wang Chi Cheung, Vincent Tan
Poster
Wed 5:00 Adaptive Estimator Selection for Off-Policy Evaluation
Yi Su, Pavithra Srinath, Akshay Krishnamurthy
Poster
Wed 8:00 Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Poster
Wed 8:00 Identifying the Reward Function by Anchor Actions
Sinong Geng, Houssam Nassif, Charlie Manzanares, Max Reppen, Ronnie Sircar
Poster
Wed 9:00 Provable Self-Play Algorithms for Competitive Reinforcement Learning
Yu Bai, Chi Jin
Poster
Wed 10:00 Efficiently Solving MDPs with Stochastic Mirror Descent
Yujia Jin, Aaron Sidford
Poster
Wed 10:00 Provable Representation Learning for Imitation Learning via Bi-level Optimization
Sanjeev Arora, Simon Du, Sham Kakade, Yuping Luo, Nikunj Saunshi
Poster
Wed 11:00 Learning Near Optimal Policies with Low Inherent Bellman Error
Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill
Poster
Wed 12:00 No-Regret Exploration in Goal-Oriented Reinforcement Learning
Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric
Poster
Wed 12:00 Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation
Marc Abeille, Alessandro Lazaric
Poster
Wed 14:00 Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo, Hiteshi Sharma, Rahul Jain
Poster
Wed 15:00 Statistically Efficient Off-Policy Policy Gradients
Nathan Kallus, Masatoshi Uehara
Poster
Wed 16:00 From Chaos to Order: Symmetry and Conservation Laws in Game Dynamics
Sai Ganesh Nagarajan, David Balduzzi, Georgios Piliouras
Poster
Thu 6:00 On the Expressivity of Neural Networks for Deep Reinforcement Learning
Kefan Dong, Yuping Luo, Tianhe Yu, Chelsea Finn, Tengyu Ma
Poster
Thu 6:00 Reward-Free Exploration for Reinforcement Learning
Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu
Poster
Thu 6:00 Reducing Sampling Error in Batch Temporal Difference Learning
Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone
Poster
Thu 6:00 Bandits for BMO Functions
Tianyu Wang, Cynthia Rudin
Poster
Thu 6:00 Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Yaqi Duan, Zeyu Jia, Mengdi Wang
Poster
Thu 7:00 Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford
Poster
Thu 7:00 Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin Yang, Mengdi Wang
Poster
Thu 7:00 ConQUR: Mitigating Delusional Bias in Deep Q-Learning
DiJia Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier
Poster
Thu 7:00 Model-Based Reinforcement Learning with Value-Targeted Regression
Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin Yang
Poster
Thu 8:00 Momentum-Based Policy Gradient Methods
Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang
Poster
Thu 8:00 Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation
Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson
Poster
Thu 9:00 On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans
Poster
Thu 12:00 Naive Exploration is Optimal for Online LQR
Max Simchowitz, Dylan Foster
Poster
Thu 12:00 Near-optimal Regret Bounds for Stochastic Shortest Path
Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan
Poster
Thu 14:00 Optimistic Policy Optimization with Bandit Feedback
Lior Shani, Yonathan Efroni, Aviv Rosenberg, Shie Mannor
Poster
Thu 17:00 Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Masatoshi Uehara, Jiawei Huang, Nan Jiang
Poster
Thu 17:00 Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making
Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, Ling Leng