When Is Symbolic Regression Tractable?
Abstract
Symbolic Regression (SR) is the task of finding a closed-form mathematical expression that optimizes some objective. Solving this task is NP-hard. However, SR software routinely discovers accurate, interpretable models without exhaustively searching function space. Motivated by this disconnect between worst-case theory and practical success, we study SR through the lens of parameterized complexity theory. In particular, we reanalyze tractability with respect to practically relevant parameters like expression depth, tree size, and number of primitives used. We show that SR is actually fixed-parameter tractable (FPT) under a parametrization over expression depth or tree size, formalizing an explanation for why the bounded-complexity search of popular SR algorithms succeeds. However, SR becomes W[1]-hard when parameterized by the number of variables or primitives used, identifying selection as a source of intractability. We further find lower bounds under the exponential time hypothesis, prove approximation hardness, and rule out polynomial kernels.