ConvexBench: Can LLMs Recognize Convex Functions?
Yepeng Liu ⋅ Yu Huang ⋅ Yu-Xiang Wang ⋅ Yingbin LIANG ⋅ Yuheng Bu
Abstract
Convexity recognition plays a central role in many optimization, control, and learning problems. However, the ability of Large Language Models (LLMs) to identify this property in symbolic expressions remains unexamined. We introduce \cb, a scalable and mechanically verifiable benchmark for testing whether LLMs can determine the convexity of a symbolic objective under deep functional composition. Experiments on frontier LLMs reveal a sharp \textit{compositional reasoning gap}: performance degrades rapidly with increasing depth, dropping from an F1-score of $1.0$ at depth $2$ to approximately $0.2$ at depth $100$. Inspection of models' reasoning traces indicates two failure modes: \textit{parsing failure} and \textit{lazy reasoning}. To address these limitations, we propose an agentic divide-and-conquer framework that (i) offloads parsing to an external tool to construct an abstract syntax tree (AST) and (ii) enforces recursive reasoning over each intermediate sub-expression with focused context. This framework reliably mitigates deep-composition failures, achieving substantial performance improvement at large depths (e.g., F1-Score $= 1.0$ at depth $100$).
Successful Page Load