Neuro-Symbolic World Models for Counter-factual Safety in Automated Driving
Abstract
Latent world models are increasingly used as compact predictors and planning accelerators for embodied control, yet standard world-model metrics do not directly test whether a model is useful as a safety critic for unexecuted driving actions. We study this gap in automated driving by constructing a counterfactual candidate benchmark in MetaDrive: for a logged driving state, the benchmark evaluates the waypoint and speed candidates that a planner could choose and records short-horizon safety outcomes. We propose a neuro-symbolic world model, NeSyWM, that augments a LeWorldModel-style action-conditioned latent predictor with symbolic risk variables, rule-energy constraints, anchor alignment, candidate-ranking losses, and calibrated thresholds consumed by a traceable safety shield. A CPU validation slice shows that positive-only latent/risk training can obtain very low validation loss while retaining high false-safe behavior, whereas energy-based symbolic objectives substantially reduce false-safe counterfactual decisions in held-out splits. A visual/raster training run over the full raster export trains BaseWM and NeSyWM over 194,470 windows under a shared data, checkpoint, and evaluation interface.