GENSR: Symbolic Regression Based in Equation Generative Space

About

Symbolic Regression (SR) tries to reveal the hidden equations behind observed data. However, most methods search within a discrete equation space, where the structural modifications of equations rarely align with their numerical behavior, leaving fitting error feedback too noisy to guide exploration. To address this challenge, we propose GenSR, a generative latent space-based SR framework following the `map construction -> coarse localization -> fine search'' paradigm. Specifically, GenSR first pretrains a dual-branch Conditional Variational Autoencoder (CVAE) to reparameterize symbolic equations into a generative latent space with symbolic continuity and local numerical smoothness. This space can be regarded as a well-structured `map'' of the equation space, providing directional signals for search. At inference, the CVAE coarsely localizes the input data to promising regions in the latent space. Then, a modified CMA-ES refines the candidate region, leveraging smooth latent gradients. From a Bayesian perspective, GenSR reframes the SR task as maximizing the conditional distribution $p(\mathrm{Equ.} \mid \mathrm{Num.})$, with CVAE training achieving this objective through the Evidence Lower Bound (ELBO). This new perspective provides a theoretical guarantee for the effectiveness of GenSR. Extensive experiments show that GenSR jointly optimizes predictive accuracy, expression simplicity, and computational efficiency, while remaining robust under noise.

Qian Li, Yuxiao Hu, Juncheng Liu, Yuntian Chen• 2026

Related benchmarks

Task	Dataset	Result
Symbolic Regression	SRBench black-box (test)	R^20.842	71
Symbolic Regression	SRBench Strogatz (test)	Mean Test R^20.994	59
Symbolic Regression	SRBench Feynman (test)	Mean Test R^298.9	57
Symbolic Regression	Strogatz Dataset epsilon=0.01 (test)	R2 Score0.9936	20
Symbolic Regression	Strogatz Dataset epsilon=0.1 (test)	R297.73	20
Symbolic Regression	Strogatz Dataset epsilon=0.001 (test)	R2 Score0.9951	20
Symbolic Regression	Strogatz Dataset ϵ = 0.0 (test)	R^20.9918	20
Symbolic Regression	Feynman Dataset epsilon=0.1 (test)	R2 Score0.9886	20
Symbolic Regression	Feynman Dataset ϵ = 0.0 (test)	R^20.9872	20
Symbolic Regression	Feynman Dataset epsilon=0.001 (test)	R298.83	20

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord