Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
About
PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Dynamics Modeling | WAM (test) | RMSE0.183 | 98 | |
| Dynamics Modeling | Barrett WAM (train) | RMSE0.143 | 49 | |
| Symbolic Regression | E. coli growth LLM-SR Suite | NMSE0.151 | 44 | |
| Symbolic Regression | Oscillation 1 LLM-SR Suite | NMSE6.12e-12 | 30 | |
| Symbolic Regression | LSR-Synth | Overall Acc (Tol 0.01)29.46 | 22 | |
| Symbolic Regression | CRK (OOD) | NMSE2.21e-8 | 18 | |
| Symbolic Regression | Oscillator 2 (ID) | NMSE4.40e-10 | 18 | |
| Symbolic Regression | Oscillator 2 (OOD) | NMSE2.05e-6 | 18 | |
| Symbolic Regression | Stress–Strain (ID) | NMSE0.0187 | 18 | |
| Symbolic Regression | Stress–Strain (OOD) | NMSE0.0772 | 18 |