Timezone: »
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL provably learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results that typically only consider tabular RL settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy ofFAIL on multiple OpenAI Gym control tasks.
Author Information
Wen Sun (Carnegie Mellon University)
Anirudh Vemula (CMU)
Byron Boots (Georgia Tech)
Drew Bagnell (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Provably Efficient Imitation Learning from Observation Alone »
Thu. Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom #111
More from the Same Authors
-
2019 Workshop: Real-world Sequential Decision Making: Reinforcement Learning and Beyond »
Hoang Le · Yisong Yue · Adith Swaminathan · Byron Boots · Ching-An Cheng -
2019 Poster: Predictor-Corrector Policy Optimization »
Ching-An Cheng · Xinyan Yan · Nathan Ratliff · Byron Boots -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Predictor-Corrector Policy Optimization »
Ching-An Cheng · Xinyan Yan · Nathan Ratliff · Byron Boots -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2018 Poster: Recurrent Predictive State Policy Networks »
Ahmed Hefny · Zita Marinho · Wen Sun · Siddhartha Srinivasa · Geoff Gordon -
2018 Oral: Recurrent Predictive State Policy Networks »
Ahmed Hefny · Zita Marinho · Wen Sun · Siddhartha Srinivasa · Geoff Gordon -
2017 Poster: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction »
Wen Sun · Arun Venkatraman · Geoff Gordon · Byron Boots · Drew Bagnell -
2017 Poster: Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control »
Yunpeng Pan · Xinyan Yan · Evangelos Theodorou · Byron Boots -
2017 Talk: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction »
Wen Sun · Arun Venkatraman · Geoff Gordon · Byron Boots · Drew Bagnell -
2017 Talk: Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control »
Yunpeng Pan · Xinyan Yan · Evangelos Theodorou · Byron Boots -
2017 Poster: Safety-Aware Algorithms for Adversarial Contextual Bandit »
Wen Sun · Debadeepta Dey · Ashish Kapoor -
2017 Talk: Safety-Aware Algorithms for Adversarial Contextual Bandit »
Wen Sun · Debadeepta Dey · Ashish Kapoor