Detecting Errors in AI-Generated Annotations: When and Why Semantic Neighbors Help
Na Di ⋅ Ling Li ⋅ Zhe Tang ⋅ Hao Cheng ⋅ Jinlong Pang ⋅ Jiaheng Wei ⋅ Zhaowei Zhu
Abstract
Large language models (LLMs) and vision-language models (VLMs) have emerged as efficient annotators for tasks such as generation and classification. While these models offer significant cost and speed advantages over human annotation, a critical challenge remains: existing self-evaluation methods, such as LLM-as-judge, often lack reliable calibration signals for error detection. We address this limitation by introducing **SAGE** (**S**emantic-**A**nchored Jud**G**m**E**nt), a method that leverages semantically similar samples retrieved via $k$-nearest-neighbor as references for annotation verification. We provide a theoretical framework that derives a closed-form expression for the error detection AUROC, which can be decomposed into three factors: intrinsic separability, reference-induced mean shift, and noise reduction through averaging. This decomposition reveals *when* semantic neighbors help (when references are both semantically matched and correct) and *why* (by providing calibration signals that raise scores for correct annotations and lower scores for incorrect ones). Experiments on LLM generation, VLM captioning, and classification tasks validate our theoretical framework: SAGE improves error detection when semantic neighbors provide reliable calibration signals, and our decomposition offers insights into when direct scoring or alternative strategies may be preferred.
Successful Page Load