CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
Abstract
End-to-end autonomous driving models trained with imitation learning (IL) often generalize poorly, particularly in long-tail scenarios where expert demonstrations are sparse. Reinforcement learning (RL) can provide complementary reward signals, but applying RL in real-world autonomous driving is challenging in offline settings without simulators, where datasets consist almost exclusively of expert actions and lack behavioral diversity. We propose CoIRL-AD, a competitive dual-policy framework that integrates IL and RL under a unified offline training regime. CoIRL-AD decouples IL and RL into separate actors to alleviate objective conflicts between imitation and reward maximization, and introduces a competition-based mechanism that stabilizes learning and enables effective exploration while remaining anchored to expert behavior. Experiments on the nuScenes benchmark show a 27\% relative reduction in collision rate weighted by L2 error compared to strong baselines, with substantially larger gains on cross-city generalization (up to 77\%) and long-tail scenarios (up to 85\%), demonstrating that competitive integration of IL and RL significantly improves robustness in offline end-to-end autonomous driving. Code is available at: \url{https://anonymous.4open.science/r/drive-with-two-minds}.