Poster
in
Workshop: RLxF: RL from World Feedback Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Calibrated Ensemble Disagreement Gates DFT Calls in Reinforcement Learning for Transition-State Discovery

Austin Jin ⋅ Bryan Cheng ⋅ Jasper Zhang

Project Page

Abstract

Locating transition states (TSs) on machine-learned potential energy surfaces is bottlenecked by a fundamental reliability problem: ML force fields (MLFFs) are least accurate precisely at the saddle-point geometries most critical to the search. We introduce ALCHEMIST, a belief-state POMDP framework that treats MLFF ensemble disagreement σF as an epistemic uncertainty signal and trains a policy to decide — at each geometry — whether to trust the cheap ensemble or spend a DFT call. The central architectural claim is that σF predicts force-prediction error; we validate this on four real-data distributions spanning in-distribution (Transition1x, ρ = +0.833), cross-functional (ρ = +0.872), out-of-distribution (RGD1, ρ = +0.639), and meta-OOD (cross-dataset + cross-functional, ρ = +0.901, N = 200 reactions, 600 configs) settings — all passing a pre-registered ρ > 0.6 gate. The calibration is composition-invariant (4/4 ensemble compositions pass a strict lower-95%-CI > 0.6 test), non-parametrically confirmed (mutual information z > 14σ above null on every distribution), and backed by a distribution-free conformal guarantee (100% conditional coverage at TS-like configs across all distributions). At TS geometries, 83% of force error is epistemic and recoverable by a DFT call. A closed-form Bayes-optimal abstention threshold σ²crit = C/|∂V/∂σ²| matches the empirical DFT-cost Pareto curve within 10% on real chemistry and transfers to OOD distributions. We train the abstention policy with PPO using an uncertainty penalty κ·σF that creates a direct reward gradient for sigma-guided abstention. Across three complete real-chemistry RL runs (up to 100 iterations, 50 reactions), the policy learns monotonically improving returns (Spearman ρ = +0.791, p < 0.0001) and — in the strongest run, which adds a DFT-guided Gentlest Ascent Dynamics step on abstain — produces the first statistically significant RMSD-toward-TS trend (ρ = −0.210, p = 0.036), with all five final training iterations showing geometry displacement toward the labeled saddle. ALCHEMIST demonstrates that ensemble epistemic uncertainty is a sound, transferable, and actionable signal for directing DFT resources in automated transition-state search.