Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Meeting Decision Trees on Tabular Data

About

Tabular data have been playing a vital role in diverse real-world fields, including healthcare, finance, etc. With the recent success of Large Language Models (LLMs), early explorations of extending LLMs to the domain of tabular data have been developed. Most of these LLM-based methods typically first serialize tabular data into natural language descriptions, and then tune LLMs or directly infer on these serialized data. However, these methods suffer from two key inherent issues: (i) data perspective: existing data serialization methods lack universal applicability for structured tabular data, and may pose privacy risks through direct textual exposure, and (ii) model perspective: LLM fine-tuning methods struggle with tabular data, and in-context learning scalability is bottle-necked by input length constraints (suitable for few-shot learning). This work explores a novel direction of integrating LLMs into tabular data throughough logical decision tree rules as intermediaries, proposes a decision tree enhancer with LLM-derived rule for tabular prediction, DeLTa. The proposed DeLTa avoids tabular data serialization, and can be applied to full data learning setting without LLM fine-tuning. Specifically, we leverage the reasoning ability of LLMs to redesign an improved rule given a set of decision tree rules. Furthermore, we provide a calibration method for original decision trees via new generated rule by LLM, which approximates the error correction vector to steer the original decision tree predictions in the direction of ``errors'' reducing. Finally, extensive experiments on diverse tabular benchmarks show that our method achieves state-of-the-art performance.

Hangting Ye, Jinmeng Li, He Zhao, Dandan Guo, Yi Chang• 2025

Related benchmarks

TaskDatasetResultRank
RegressionFiat500 TabArena v0.1 (test)
RMSE719.7
10
RegressionInsurance TabArena v0.1 (test)
RMSE4.54e+3
10
Binary ClassificationMarketing TabArena v0.1 (test)
ROC AUC88.3
10
Binary ClassificationCreditG TabArena v0.1 (test)
ROC AUC0.774
10
Binary ClassificationQSARBio TabArena v0.1 (test)
ROC AUC92.3
10
Binary ClassificationHazelnut TabArena v0.1 (test)
ROC AUC97.4
10
RegressionAirfoil TabArena v0.1 (test)
RMSE1.432
10
RegressionConcrete TabArena v0.1 (test)
RMSE4.82
10
Binary ClassificationDiabetes TabArena v0.1 (test)
ROC AUC80.9
10
Binary ClassificationTabArena Customer v0.1 (test)
ROC AUC0.706
10
Showing 10 of 17 rows

Other info

Follow for update