ICML PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving

Poster
in
Workshop: AI for Math Workshop

PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving

George Tsoukalas · Jasper Lee · John Jennings · Jimmy Xin · Michelle Ding · Michael Jennings · Amitayush Thakur · Swarat Chaudhuri

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

We present PutnamBench, a new multilingual evaluation benchmark for formal theorem-proving. PutnamBench consists of formalizations of problems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problem statements come with formalizations in Lean 4 and Isabelle; a substantial subset have Coq formalizations as well. PutnamBench consists of 1337 hand-written formalizations across the three proof assistants, and aims to benchmark the next generation of theorem-proving algorithms for competition mathematics. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We evaluate several established neural and symbolic theorem provers using PutnamBench. These approaches can only solve a handful of the problems, establishing our benchmark as a difficult open challenge for research on formal theorem-proving. PutnamBench is available at https://github.com/trishullab/PUTNAM.

Chat is not available.

Poster in Workshop: AI for Math Workshop

PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving

George Tsoukalas · Jasper Lee · John Jennings · Jimmy Xin · Michelle Ding · Michael Jennings · Amitayush Thakur · Swarat Chaudhuri

Poster
in
Workshop: AI for Math Workshop