Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, most have been shown to be lacking in their generalization capabilities. These methods usually require sampling an exhaustive amount of data from different environment configurations. We propose a hybrid policy architecture with a deep network and a shortest path planner working in unison. The model can be trained end-to-end via blackbox-differentiation. The deep network learns to predict time-dependent way-costs such that internal plans match expert trajectories. These neuro-algorithmic policies generalize well to unseen environment configurations.