Teaching the Teacher: The Role of Teacher-Student Smoothness Alignment in Genetic Programming-based Symbolic Distillation
About
Obtaining human-readable symbolic formulas via genetic programming-based symbolic distillation of a deep neural network trained on the target dataset presents a promising yet underexplored path towards explainable artificial intelligence (XAI); however, the standard pipeline frequently yields symbolic models with poor predictive accuracy. We identify a fundamental misalignment in functional complexity as the primary barrier to achieving better accuracy: standard Artificial Neural Networks (ANNs) often learn accurate but highly irregular functions, while Symbolic Regression typically prioritizes parsimony, often resulting in a much simpler class of models that are unable to sufficiently distill or learn from the ANN teacher. To bridge this gap, we propose a framework that actively regularizes the teacher's functional smoothness using Jacobian and Lipschitz penalties, aiming to distill better student models than the standard pipeline. We characterize the trade-off between predictive accuracy and functional complexity through a robust study involving 20 datasets and 50 independent trials. Our results demonstrate that students distilled from smoothness-regularized teachers achieve statistically significant improvements in R^2 scores, compared to the standard pipeline. We also perform ablation studies on the student model algorithm. Our findings suggest that smoothness alignment between teacher and student models is a critical factor for symbolic distillation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Symbolic Distillation | Airfoil Self-Noise | Mean R^20.3872 | 3 | |
| Symbolic Distillation | Elec. Grid Stability | Mean R^20.6814 | 3 | |
| Symbolic Distillation | Real Estate (val) | Mean R^20.4992 | 3 | |
| Symbolic Distillation | AI4I Maintenance 2020 | Mean R^20.6255 | 3 | |
| Symbolic Distillation | Auto MPG | Mean R^20.7615 | 3 | |
| Symbolic Distillation | Combined Cycle Power | Mean R^20.9105 | 3 | |
| Symbolic Distillation | Cholesterol | Mean R^20.2138 | 3 | |
| Symbolic Distillation | Munich Rent Index | Mean R20.14 | 3 | |
| Symbolic Distillation | SEA 50000 | Mean R^20.3985 | 3 | |
| Symbolic Distillation | Kin8nm | Mean R^20.3024 | 3 |