Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transformer-based Planning for Symbolic Regression

About

Symbolic regression (SR) is a challenging task in machine learning that involves finding a mathematical expression for a function based on its values. Recent advancements in SR have demonstrated the effectiveness of pre-trained transformer-based models in generating equations as sequences, leveraging large-scale pre-training on synthetic datasets and offering notable advantages in terms of inference time over classical Genetic Programming (GP) methods. However, these models primarily rely on supervised pre-training goals borrowed from text generation and overlook equation discovery objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity, as external sources of knowledge into the transformer-based equation generation process. Extensive experiments on various datasets show that our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, extrapolation abilities, and robustness to noise.

Parshin Shojaee, Kazem Meidani, Amir Barati Farimani, Chandan K. Reddy• 2023

Related benchmarks

TaskDatasetResultRank
Symbolic RegressionOscillation 1 LLM-SR Suite
NMSE0.0048
30
Symbolic RegressionStrogatz Dataset epsilon=0.1 (test)
R297.07
20
Symbolic RegressionFeynman Dataset epsilon=0.001 (test)
R299.2
20
Symbolic RegressionFeynman Dataset epsilon=0.01 (test)
R20.9911
20
Symbolic RegressionFeynman Dataset ϵ = 0.0 (test)
R^20.9921
20
Symbolic RegressionStrogatz Dataset epsilon=0.01 (test)
R2 Score0.9798
20
Symbolic RegressionFeynman Dataset epsilon=0.1 (test)
R2 Score0.9836
20
Symbolic RegressionStrogatz Dataset ϵ = 0.0 (test)
R^20.965
20
Symbolic RegressionStrogatz Dataset epsilon=0.001 (test)
R2 Score0.9216
20
Symbolic RegressionCRK (ID)
NMSE8.48e-7
18
Showing 10 of 28 rows

Other info

Code

Follow for update