Symbolic Regression via Neural-Guided Genetic Programming Population Seeding
About
Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks. Source code is provided at www.github.com/brendenpetersen/deep-symbolic-optimization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Symbolic Regression | DGSR benchmark | Recall100 | 22 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer Consolidated 1.0 (Length <= 8) | Exact Recovery Rate98 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer Consolidated 1.0 (Length 21-30) | Average Exact Recovery Rate1.70e+3 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer (Consolidated) Length 9-10 1.0 | Exact Recovery Rate77 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer Consolidated 1.0 (Length 11-12) | Average Exact Recovery Rate0.25 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer (Consolidated) 1.0 (Length 13-14) | Avg Exact Recovery Rate29 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer Consolidated Length 15-16 1.0 | Exact Recovery Rate0.16 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer (Consolidated) 1.0 (Length 17-20) | Average Exact Recovery Rate2 | 10 | |
| Symbolic Regression | Nguyen, Livermore, and Keijzer (Consolidated) Length >= 31 1.0 | Exact Recovery Rate0.00e+0 | 10 |