Transformer-based Planning for Symbolic Regression
About
Symbolic regression (SR) is a challenging task in machine learning that involves finding a mathematical expression for a function based on its values. Recent advancements in SR have demonstrated the effectiveness of pre-trained transformer-based models in generating equations as sequences, leveraging large-scale pre-training on synthetic datasets and offering notable advantages in terms of inference time over classical Genetic Programming (GP) methods. However, these models primarily rely on supervised pre-training goals borrowed from text generation and overlook equation discovery objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity, as external sources of knowledge into the transformer-based equation generation process. Extensive experiments on various datasets show that our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, extrapolation abilities, and robustness to noise.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Symbolic Regression | Oscillation 1 LLM-SR Suite | NMSE0.0048 | 30 | |
| Symbolic Regression | Strogatz Dataset epsilon=0.1 (test) | R297.07 | 20 | |
| Symbolic Regression | Feynman Dataset epsilon=0.001 (test) | R299.2 | 20 | |
| Symbolic Regression | Feynman Dataset epsilon=0.01 (test) | R20.9911 | 20 | |
| Symbolic Regression | Feynman Dataset ϵ = 0.0 (test) | R^20.9921 | 20 | |
| Symbolic Regression | Strogatz Dataset epsilon=0.01 (test) | R2 Score0.9798 | 20 | |
| Symbolic Regression | Feynman Dataset epsilon=0.1 (test) | R2 Score0.9836 | 20 | |
| Symbolic Regression | Strogatz Dataset ϵ = 0.0 (test) | R^20.965 | 20 | |
| Symbolic Regression | Strogatz Dataset epsilon=0.001 (test) | R2 Score0.9216 | 20 | |
| Symbolic Regression | CRK (ID) | NMSE8.48e-7 | 18 |