Euclean: Automated Geometry Problem Formalization with Unified Verification in Lean
Linbin Tang ⋅ Jingyan You ⋅ Zilin Kang ⋅ Hanzhang Liu ⋅ Sophia Zhang ⋅ Zenan Li ⋅ Chenrui Cao ⋅ Liangcheng Song ⋅ Jiaao Wu ⋅ Xian Zhang ⋅ Fan Yang
Abstract
Recent formal reasoning systems achieve IMO-level performance, but create a fragmented landscape: algebra and number theory use Lean, while geometry relies on domain-specific languages with limited formal guarantees. This fragmentation increases the trusted computing base and hinders unified model development. Existing geometry-in-Lean efforts (LeanEuclid, LeanGeo) introduce custom axiom systems incompatible with standard \mathlib{}, and their small scale ($<$1K problems) prevents large-scale training. However, native \mathlib{} autoformalization of geometry poses unique challenges: explicating implicit diagrammatic assumptions (e.g., topological configuration and non-degeneracy)---unlike existing custom systems that defer validity checks to external solvers---and adapting to \mathlib{}'s small, rapidly-evolving geometric constructs. We present \method{}, a framework that addresses these challenges through a four-stage pipeline---constraint explication, configuration anchoring, formalization mapping, and iterative repair---to automatically formalize geometry in native \mathlib{}. We construct OMNI-Geometry (768 competition problems) and Numina-Geometry (177,597 problems), the largest geometry formalization dataset in Lean. Human evaluation shows 48.89\% TOP1 and 73.33\% TOP5 accuracy. Training Goedel v2 on our formalizations improves proof success from 13.6\% to 15.1\%, validating dataset quality for unified neural theorem proving.
Successful Page Load