Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors

About

Tabular prediction traditionally relies on gradient-boosted decision trees and deep learning models, which excel in specific tasks but lack interpretability and transferability. Reasoning large language models (LLMs) promise cross-task adaptability with transparent reasoning traces, yet their potential for tabular data remains unrealized. To bridge this gap, we propose a reasoning framework centered on Permutation Relative Policy Optimization (PRPO), a reinforcement learning method that encodes column-permutation invariance as a structural prior. By estimating advantages across label-preserving permutations, PRPO transforms sparse rewards into dense signals, activating latent numerical reasoning capabilities of LLMs with limited supervision. Extensive experiments show that our method matches fully supervised baselines and dominates in zero-shot settings, performing on par with 32-shot strong baselines. Remarkably, our 8B model significantly outperforms much larger LLMs, achieving up to a 53.17% improvement over DeepSeek-R1 (685B).

Pengxiang Cai, Zihao Gao, Wanchen Lian, Jintai Chen• 2025

Related benchmarks

Task	Dataset	Result
Tabular Classification	53 classification datasets (unseen)	Mean Accuracy75.42	18
Tabular Regression	21 Regression Datasets	Mean NMAE0.111	15
Tabular Classification	50 classification datasets	Mean Accuracy84.36	10
Tabular Regression	15 regression datasets	Mean NMAE0.1499	10

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord