Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Probabilistic Chain-of-Thought: Sequential Bayesian Inference over Latent Reasoning Correctness

Suriya Dev Saravanakumar ⋅ Ezra Wesenie ⋅ Kishore Nuthalapati ⋅ Laksh Patel

Project Page

Abstract

Chain-of-thought prompting elicits multi-step reasoning from large language models, yet existing approaches treat confidence at each step as an independent signal. This independence assumption contradicts the autoregressive generation process, wherein errors at early steps propagate forward and corrupt downstream outputs, creating epistemic blind spots where a model appears locally certain but is globally unreliable. We introduce \emph{Probabilistic Chain-of-Thought} (PCoT), which models a reasoning chain as a Hidden Markov Model over latent step correctness and performs exact posterior inference via the forward-backward algorithm. PCoT yields a principled answer confidence $C_{\mathrm{final}}$ and a posterior-driven reflection policy that dominates raw-score threshold rules under the model. On MATH and GSM8K, PCoT reduces Expected Calibration Error by $\mathbf{76\%}$ over the best heuristic baseline and improves accuracy by $\mathbf{14.7}$ percentage points at a $2\times$ token budget, while remaining robust across three confidence estimators. Our analysis of \emph{sequential contamination}---whereby a single upstream error suppresses posteriors of all downstream steps--- provides a formal explanation for why point-wise step scoring is insufficient for reliable reasoning evaluation.