Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection
Abstract
Vision-language models (VLMs) are increasingly used for scene understanding in autonomous driving, but robustness analysis often relies on task-agnostic embedding stability alone. We study whether corruption-induced embedding drift predicts changes in a task-aligned hazard score derived from CLIP image-text similarities. Using controlled corruptions on BDD100K road scenes, we compare embedding drift against margin drift, defined as the change in hazard score under perturbation. The relationship is highly corruption-dependent: some families exhibit strong coupling between representation drift and decision drift, while others induce hazardous decision instability despite relatively modest embedding change. A scatter analysis reveals a structured upper envelope relating margin drift to embedding drift, suggesting that representation change constrains but does not determine decision instability. These results suggest that robustness benchmarks should include task-aligned stability measures in addition to embedding-level perturbation statistics.