Symbolic Regression with a Learned Concept Library
About
We present a novel method for symbolic regression (SR), the task of searching for compact programmatic hypotheses that best explain a dataset. The problem is commonly solved using genetic algorithms; we show that we can enhance such methods by inducing a library of abstract textual concepts. Our algorithm, called LaSR, uses zero-shot queries to a large language model (LLM) to discover and evolve concepts occurring in known high-performing hypotheses. We discover new hypotheses using a mix of standard evolutionary steps and LLM-guided steps (obtained through zero-shot LLM queries) conditioned on discovered concepts. Once discovered, hypotheses are used in a new round of concept abstraction and evolution. We validate LaSR on the Feynman equations, a popular SR benchmark, as well as a set of synthetic tasks. On these benchmarks, LaSR substantially outperforms a variety of state-of-the-art SR approaches based on deep learning and evolutionary algorithms. Moreover, we show that LaSR can be used to discover a novel and powerful scaling law for LLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Symbolic Regression | E. coli growth LLM-SR Suite | NMSE0.0085 | 44 | |
| Symbolic Regression | Oscillation 1 LLM-SR Suite | NMSE7.15e-6 | 30 | |
| Symbolic Regression | LSR-Synth | Overall Acc (Tol 0.01)16.02 | 22 | |
| Symbolic Regression | Stress–Strain (OOD) | NMSE0.0618 | 18 | |
| Symbolic Regression | CRK (ID) | NMSE9.55e-11 | 18 | |
| Symbolic Regression | Stress–Strain (ID) | NMSE0.0164 | 18 | |
| Symbolic Regression | CRK (OOD) | NMSE7.25e-8 | 18 | |
| Symbolic Regression | Oscillator 2 (ID) | NMSE2.68e-8 | 18 | |
| Symbolic Regression | Oscillator 2 (OOD) | NMSE3.51e-5 | 18 | |
| Symbolic Regression | Oscillator 1 (OOD) | NMSE0.0054 | 18 |