Symbolic Regression via Neural-Guided Genetic Programming Population Seeding
About
Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks. Source code is provided at www.github.com/brendenpetersen/deep-symbolic-optimization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Symbolic Regression | DGSR benchmark | Recall100 | 22 | |
| Symbolic Regression | SRBench Feynman | R^2 Score0.971 | 16 | |
| Symbolic Regression | SRBench Black-box | R^20.9033 | 16 | |
| Symbolic Regression | Nguyen | R^299.99 | 15 | |
| Symbolic Regression | Livermore | R^297.46 | 15 | |
| Symbolic Regression | Vladislavleva | R^299.63 | 15 | |
| Symbolic Regression | Constant | R^20.9988 | 15 | |
| Symbolic Regression | Keijzer | R^20.9924 | 15 | |
| Symbolic Regression | Korns | R^20.9872 | 15 | |
| Symbolic Regression | SRSD-Feynman Easy | NED0.63 | 12 |