Easier to Judge than to Find: Predicting In-Context Learning Success for Demonstration Selection
Haochun Wang ⋅ Chaofen Yang ⋅ Jiatong Liu ⋅ Jingbo Wang ⋅ Zewen Qiang ⋅ Sendong Zhao ⋅ Ting Liu ⋅ Bing Qin
Abstract
In-context learning (ICL) is highly sensitive to which demonstrations appear in the prompt, but selecting them is expensive because candidate contexts must be validated with repeated LLM calls. We argue that demonstration selection is \emph{easier to judge than to find}: predicting whether a specific query--context pair $(q,D)$ will succeed is cheaper and more general than searching for an optimal $D^\star$. Based on this insight, we propose DiSP, a sample-and-judge framework that stratifies queries by difficulty. DiSP runs random demonstration trials to estimate each training query's success rate, trains a lightweight router to predict difficulty from the query, and trains level-specific judges to score sampled contexts. At inference, DiSP performs stop-on-acceptance judging under an explicit budget and typically makes a single LLM call, emitting diagnostic risk tags when no suitable context is found. Across five classification datasets with Llama 3–8B and Qwen 2.5–7B, DiSP achieves the best average accuracy, improving over strong learned selection baselines by up to 3.4%, while achieving up to 23× end-to-end wall-clock speedup.
Successful Page Load