Enhancing Neural Theorem Proving via High-Quality Proof Selection and Verifier Feedback
Abstract
Recent advances in large language models have accelerated neural theorem proving (NTP). Isabelle is a mature and important formal theorem prover that has been widely used in software and hardware verification. However, progress in the Isabelle setting remains limited. Existing approaches either optimize search strategies or train on highly imbalanced raw proof corpora. At the same time, the specialized structure of Isabelle proofs limits the effectiveness of general-purpose data selection methods. To address these challenges, we adopt a data-centric framework for neural theorem proving in Isabelle. We characterize high-quality formal proof data along three complementary dimensions—proof complexity, semantic coverage, and reasoning diversity (PSR)—and propose a PSR-guided data selection pipeline to construct a compact, high-quality training subset. In addition, we leverage verifier feedback as a dynamic data signal during inference, introducing a dynamic feedback-based prompt optimization that iteratively incorporates Isabelle verifier feedback to guide proof generation. We construct and release a 4k high-quality Isabelle dataset based on the PSR criterion. On the miniF2F-test, fine-tuning solely on PSR-selected data achieves 84.8% Pass@64. When further combined with dynamic feedback–based prompt optimization, the full framework improves performance to 90.6% Pass@64, establishing a new state of the art for neural theorem proving in Isabelle.