Timezone: »

(#46 / Sess. 2) Discrete Planning with End-to-end Trained Neuro-algorithmic Policies
Marin Vlastelica

Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, most have been shown to be lacking in their generalization capabilities. These methods usually require sampling an exhaustive amount of data from different environment configurations. We propose a hybrid policy architecture with a deep network and a shortest path planner working in unison. The model can be trained end-to-end via blackbox-differentiation. The deep network learns to predict time-dependent way-costs such that internal plans match expert trajectories. These neuro-algorithmic policies generalize well to unseen environment configurations.

Teaser video | [ protected link dropped ]

Author Information

Marin Vlastelica (Max Planck Institute for Intelligent Systems)

More from the Same Authors