PRISM: Training-Free Video Anomaly Detection via Intrinsic Statistical Modeling
Abstract
While emerging training-free video anomaly detection (VAD) methods offer advantages such as interpretability and ease of deployment, they often suffer from computational inefficiency due to complex memory retrieval mechanisms or high-latency visual language models (VLMs). To address this, we propose PRISM (Parameter-free Recognition Based on Intrinsic Statistical Modeling), a novel framework for efficient open-set anomaly detection with minimal computational cost. PRISM based on a pre-trained multimodal embedding model, introduces differential amplification and whitening mechanisms to statistically suppress common-mode background noise in the embedding space, thereby significantly improving the signal-to-noise ratio of anomalous events. Extensive experiments on three mainstream datasets demonstrate that PRISM achieves state-of-the-art performance Real-time reasoning ability and interpretability. Furthermore, our statistical model provides a theoretical explanation for the performance gap (particularly mean accuracy (AP)) observed in existing training-free methods on complex datasets such as XD-Violence.