Beyond Accuracy and Complexity: The Effective Information Criterion for Structurally Stable Symbolic Regression
Zihan Yu ⋅ Guanren Wang ⋅ Ding ⋅ Huandong Wang ⋅ Yong Li
Abstract
Symbolic regression (SR) traditionally balances accuracy and complexity, implicitly assuming that simpler formulas are structurally more rational. We argue that this assumption is insufficient: existing algorithms often exploit this metric to discover accurate and compact but structurally irrational formulas that are numerically ill-conditioned and physically inexplicable. Inspired by the structural stability of real physical laws, we propose the Effective Information Criterion (EIC) to quantify formula rationality. EIC models formulas as information channels and measures the amplification of inherent rounding noise during recursive calculation, effectively distinguishing physically plausible structures from pathological ones without relying on ground truth. Our analysis reveals a stark structural stability gap between human-derived equations and SR-discovered results. By integrating EIC into SR workflows, we provide explicit structural guidance: for heuristic search, EIC steers algorithms toward stable regions to yield superior Pareto frontiers; for generative models, EIC-based filtering improves pre-training sample efficiency by 2–4 times and boosts generalization $R^2$ by 22.4\%. Finally, an extensive study with 108 human experts shows that EIC aligns with human preferences in 70\% of cases, validating structural stability as a critical prerequisite for human-perceived interpretability. We release our code at https://anonymous.4open.science/r/EIC-91B2.
Successful Page Load