Timezone: »

Causal Imitation Learning under Temporally Correlated Noise
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #825

We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.

Author Information

Gokul Swamy (Carnegie Mellon University)
Sanjiban Choudhury (Cornell University)
James Bagnell (Aurora Innovation)

Drew has worked for two decades at the intersection of machine learning and robotics both as a faculty member at Carnegie Mellon University and in engagements with industry from self-driving haul trucks to perception architecture for Uber’s self-driving cars and in his current role as CTO of Aurora Innovation.

Steven Wu (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors