Invited Talk - Beyond Theorem Proving: Towards Proof Engineering at Scale
Abstract
Large language models have advanced from solving Olympiad-style problems expressed in natural language to constructing complete, step-by-step formal proofs verifiable by proof assistants like Lean. This talk surveys that journey: from the rise of competitive math and programming benchmarks, through the development of step-by-step, whole-proof, and hybrid proving paradigms, to techniques for autoformalization leveraging human-curated formal libraries. Despite this impressive progress, a crucial limitation remains: current efforts still largely focus on isolated theorem-proving tasks, neglecting the complexity of maintaining, refactoring, and extending large, continuously evolving bodies of formal mathematics. I argue that the next significant milestone is proof engineering—automating everyday code edits, bug fixes, and feature additions that sustain active formal libraries like Mathlib. Preliminary experiments using file-level datasets extracted from thousands of real-world commits indicate that even state-of-the-art models struggle to produce syntactically valid and semantically faithful edits guided by natural-language instructions. These findings highlight opportunities to adapt realistic, agent-based approaches from software engineering to formal mathematics. Ultimately, I aim to inspire a community-wide effort towards moving beyond medal-winning provers to autonomous maintainers of formal mathematical knowledge.