Large Language Models as Topological Thinkers: A Benchmark on Graph Persistent Homology
Abstract
Large language models (LLMs) are increasingly used in scientific discovery, system modeling, and decision-making, prompting interest in their ability to reason over complex structured data. Existing benchmarks primarily focus on static or local graph reasoning, overlooking the high-order structures in real-world systems whose global properties evolve across multiple scales. We introduce LLM4PH, a benchmark that evaluates multi-scale structural reasoning through the lens of persistent homology (PH), a topological framework for tracking structural evolution. LLM4PH decomposes the PH pipeline into interpretable reasoning tasks spanning synthetic and real-world graphs, revealing that most models struggle with reasoning over structural transitions and persistence. Beyond task-level evaluation, we perform cross-task ablations on prompt encoding and transfer, explore post-training effects, and construct a compositional PH pipeline to assess end-to-end performance. Our results provide the first in-depth view of how well LLMs bridge discrete graph structures with continuous topological abstraction, and offer insights into their potential for structure-aware scientific reasoning.