ICML Lean4trace: Data augmentation for neural theorem proving in Lean

Poster
in
Workshop: AI for Math Workshop

Lean4trace: Data augmentation for neural theorem proving in Lean

Vasilii Nesterov · Yermek Kapushev · Mikhail Burtsev

[ Abstract ] [ Project Page ]

[ Slides] [ OpenReview]

Abstract:

Integrating large language models as proof assistants with theorem provers has shown great promise. However, one of the major challenges in this field is the scarcity of training data. To address this, we release a new open-source tool, Lean4trace, for training data extraction from Lean 4 sources. Unlike previous approaches, Lean4trace is deeply integrated into the Lean elaborator, allowing us to modify proofs on-the-fly. Leveraging this feature, we propose two methods of data augmentation in Lean: (1) decomposing composite proof steps into multiple simpler steps; (2) testing existing proof automation tactics at each proof state and collecting the successful ones. Models trained on this augmented data are capable of proving 58.0% of theorems from a hold-out subset of Mathlib and 35.6% of the test subset of the MiniF2F benchmark.

Chat is not available.

Poster in Workshop: AI for Math Workshop

Lean4trace: Data augmentation for neural theorem proving in Lean

Vasilii Nesterov · Yermek Kapushev · Mikhail Burtsev

Poster
in
Workshop: AI for Math Workshop