Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors

About

Tabular prediction traditionally relies on gradient-boosted decision trees and deep learning models, which excel in specific tasks but lack interpretability and transferability. Reasoning large language models (LLMs) promise cross-task adaptability with transparent reasoning traces, yet their potential for tabular data remains unrealized. To bridge this gap, we propose a reasoning framework centered on Permutation Relative Policy Optimization (PRPO), a reinforcement learning method that encodes column-permutation invariance as a structural prior. By estimating advantages across label-preserving permutations, PRPO transforms sparse rewards into dense signals, activating latent numerical reasoning capabilities of LLMs with limited supervision. Extensive experiments show that our method matches fully supervised baselines and dominates in zero-shot settings, performing on par with 32-shot strong baselines. Remarkably, our 8B model significantly outperforms much larger LLMs, achieving up to a 53.17% improvement over DeepSeek-R1 (685B).

Pengxiang Cai, Zihao Gao, Wanchen Lian, Jintai Chen• 2025

Related benchmarks

TaskDatasetResultRank
Tabular Classification53 classification datasets (unseen)
Mean Accuracy75.42
18
Tabular Regression21 Regression Datasets
Mean NMAE0.111
15
Tabular Classification50 classification datasets
Mean Accuracy84.36
10
Tabular Regression15 regression datasets
Mean NMAE0.1499
10
Showing 4 of 4 rows

Other info

Follow for update