Provably Label-Efficient Conformal Prediction
Andrew Ilyas ⋅ Joonhyuk Ko ⋅ Jingwu Tang ⋅ Steven Wu ⋅ Jiahao Zhang
Abstract
Conformal prediction converts any black-box predictor into one with finite-sample, distribution-free coverage guarantees, outputting prediction sets $T(x)$ that contain the true label with probability at least $1-\alpha$. To construct these prediction sets, conformal prediction relies on a randomly sampled ``calibration set'' of labeled examples. In many applications, however, this labeled calibration set is costly to collect, creating a tradeoff between upfront labeling cost and downstream utility of the conformal predictor. In this work, we study *conformal prediction with costly label queries*, where unlabeled examples arrive i.i.d. and labels can be queried one at a time. After $m$ queries, we form a conformal predictor; the upfront cost of this predictor is the calibration set size $m$, and its efficiency is the expected prediction set size $\mathbb{E}|T_m(X)|$. We design an online stopping rule $\hat{m}$ that automatically balances the upfront cost against conformal efficiency *while preserving the original conformal guarantee*. Theoretically, we show that under mild regularity assumptions, the expected total cost of our stopping rule matches the best fixed calibration size in hindsight. Experimentally, we find that our stopping rule reduces cost compared to standard choices of $m$ from the literature by 41.4% $\pm$ 2.3%. Finally, as a concrete application we demonstrate a reduction from CP to the probably approximately correct labeling problem of Candès et al. (2025), under which our stopping rule minimizes the total labeling cost.
Successful Page Load