Poster
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$
Shengbin Ye · Meng Li
East Exhibition Hall A-B #E-1601
Scientists often want to understand how different factors relate to each other by finding clear, math-based rules in their data. Symbolic regression (SR) is a technique that does exactly this—it searches for equations that explain patterns, which can lead to new scientific insights. But SR struggles when there are too many variables, which is common in fields like biology, physics, or climate science. Too many inputs make the search very slow and the resulting equations hard to understand. Our method, called PAN+SR, helps SR focus on just the most important variables before trying to find equations. It does this using a model-free filtering step that’s flexible and avoids strong assumptions. We also built a new set of benchmark problems that reflect the messy, high-dimensional data real scientists deal with. PAN+SR improves the performance of many existing SR tools and helps them find better, simpler equations more quickly. This makes it easier for researchers to use symbolic regression in real-world science, where both accuracy and interpretability matter.
Live content is unavailable. Log in and register to view live content