xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

About

Inference from tabular data, collections of continuous and categorical variables organized into matrices, is a foundation for modern technology and science. Yet, in contrast to the explosive changes in the rest of AI, the best practice for these predictive tasks has been relatively unchanged and is still primarily based on variations of Gradient Boosted Decision Trees (GBDTs). Very recently, there has been renewed interest in developing state-of-the-art methods for tabular data based on recent developments in neural networks and feature learning methods. In this work, we introduce xRFM, an algorithm that combines feature learning kernel machines with a tree structure to both adapt to the local structure of the data and scale to essentially unlimited amounts of training data. We show that compared to $31$ other methods, including recently introduced tabular foundation models (TabPFNv2) and GBDTs, xRFM achieves best performance across $100$ regression datasets and is competitive to the best methods across $200$ classification datasets outperforming GBDTs. Additionally, xRFM provides interpretability natively through the Average Gradient Outer Product.

Daniel Beaglehole, David Holzm\"uller, Adityanarayanan Radhakrishnan, Mikhail Belkin• 2025

Related benchmarks

Task	Dataset	Result
Binary Classification	TALENT (test)	Top-1 Accuracy29.6	113
Binary Classification	TabArena	Elo Rating1.31e+3	74
Multiclass Classification	TabArena Lite	Elo Rating1.36e+3	63
Tabular Learning	TabArena	Elo1.34e+3	54
Regression	TabArena Lite	Elo1.56e+3	48
Multiclass Classification	TALENT	SGMε10.7	42
Classification	Covertype	--	40
Multiclass Classification	TALENT Multiclass (> 10 classes) Full (avg across datasets)	Rank5.46	31
Regression	TALENT 100 datasets	Rank4.7	28
Classification	higgs	ERR26.4	19

Showing 10 of 28 rows

Other info

Follow for update

@wizwand_team Discord