Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks
Abstract
Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction under adversarial perturbations at test time, focusing on both coverage validity and the resulting prediction {efficiency}. Our theoretical analysis characterizes how the strength of adversarial perturbations during calibration affects {the coverage gap relative to the nominal coverage level} under adversarial test conditions. We further examine the impact of adversarial training at the model-training stage. Experiments support our theory: (i) Prediction coverage varies monotonically with the calibration-time attack strength, enabling the use of nonzero calibration-time attack to predictably control coverage under adversarial tests; (ii) the marginal coverage can remain within a user-specified tolerance band around the nominal coverage level and (iii) adversarial training at the training stage produces tighter prediction sets that improve efficiency while maintaining coverage validity.