Towards Trustworthy Video Anomaly Understanding: A Class-Guided Chain-of-Evaluation Metric and An Anomaly-focused Meta-Benchmark
Abstract
The trustworthiness of evaluation is critical to reliable model comparison and deployment in Video Anomaly Understanding (VAU). However, existing metrics are sensitive to expression styles and normal content, and this field lacks a diagnostic benchmark to validate metric validity and robustness. To bridge this gap, we propose: (1) a Class-Guided Chain-of-Evaluation (CG-CoE) metric, which structures assessment by extracting anomalous events and matching them under a class-specific semantic tolerance boundary, thereby decoupling anomaly semantics from descriptive style; and (2) an anomaly-focused meta-evaluation benchmark with two subsets: Anomalous Event-level Annotations (AEA) for measuring the validity of reflecting VAU models’ anomaly understanding ability and Controlled Variant Pairs (CVP) with fixed anomalies for quantifying robustness to stylistic perturbations. Extensive experiments demonstrate that CG-CoE achieves SOTA validity and robustness.