Chunked TabPFN: Exact Training-Free In-Context Learning for Long-Context Tabular Data

About

TabPFN v2 achieves better results than tree-based models on several tabular benchmarks, which is notable since tree-based models are usually the strongest choice for tabular data. However, it cannot handle more than 10K context tokens because transformers have quadratic computation and memory costs. Unlike existing approaches that rely on context compression, such as selecting representative samples via K-nearest neighbors (KNN), we introduce a tiled-block strategy to compute attention within the TabPFN framework. This design is compatible with standard GPU setups and, to the best of our knowledge, is the first to enable TabPFN to process long contexts without any pre-processing. We demonstrate the effectiveness of our approach on the standard TabArena benchmark, with code available at https://github.com/mrsergazinov/chunk_tabpfn.

Renat Sergazinov, Shao-An Yin• 2025

Related benchmarks

Task	Dataset	Result
Classification	Adult	Accuracy91.3	86
Classification	Diabetes	Accuracy81.64	80
Classification	Credit	--	63
Classification	German	Accuracy79.4	58
Classification	blood	ROC-AUC0.7805	47
Tabular Prediction	TabArena all 51 datasets	Elo Rating1.27e+3	38
Tabular Classification	Heart	Mean AUC-ROC92.78	31
Classification	Synthetic	Accuracy91.8	31
Classification	Census KDD	Accuracy87.59	26
Tabular Learning	Long-context 15 datasets v2 (test)	Avg. Normalized RMSE0.534	9

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord