Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Thu, Jul 9, 2026 • 7:00 PM – 8:00 PM PDT

The Prompt Is the Analytic Choice: Specification Curve Analysis for LLM-Based Social Science

Jacob Crainic ⋅ Brandon Yee ⋅ Pairie Koh

Project Page

Abstract

Large language models are widely used as synthetic survey respondents, yet the prompts that elicit their responses rest on choices of model, persona, framing, system register, temperature, and few-shot count that go undisclosed. This carries the analytic-flexibility problem of the credibility revolution into the elicitation stage. We develop Prompt Specification Curve Analysis (P-SCA), which enumerates defensible prompts, decomposes response variance with $\eta^2$, and tests dimension dominance via Fisher $r$-to-$z$. Applied to a 2,592-cell multiverse across six LLMs with 600 specifications on three 2024 ANES items, P-SCA shows that the partisan signal survives on every item ($p < 0.0001$; 95%, 95%, 83% directional consistency), though sensitivity is topic-contingent. Question framing accounts for 2.5 times more variance than any other dimension on gun control ($\eta^2 = 0.160$ versus $0.065$ for model; $z = +2.65$, $p = 0.008$), while model dominates the others. A permutation-derived coverage threshold near 49% is exceeded by 34 to 46 percentage points in observed coverage, and LLM partisan gaps exceed ANES 2024 by 1.71 to 2.17 on well-posed items (jackknife CIs exclude unity). We propose a Prompt Specification Reporting Standard for LLM-based research.