Valid Feature-Level Inference for Tabular Foundation Models via the Conditional Randomization Test

About

Modern machine learning models are highly expressive but notoriously difficult to analyze statistically. In particular, while black-box predictors can achieve strong empirical performance, they rarely provide valid hypothesis tests or p-values for assessing whether individual features contain information about a target variable. This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data. The resulting procedure yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings, without requiring model retraining or parametric assumptions.

Mohamed Salem• 2026

Related benchmarks

Task	Dataset	Result
Conditional Randomization Test	Linear sparse	Statistical Power1	1
Conditional Randomization Test	Linear dense	Power1	1
Conditional Randomization Test	Weak signal	Statistical Power1	1
Conditional Randomization Test	Noise block	Power (%)100	1
Conditional Randomization Test	Correlated linear	Power1	1
Conditional Randomization Test	Friedman 1	Statistical Power1	1
Conditional Randomization Test	Friedman 2	Statistical Power0.6	1
Conditional Randomization Test	Friedman 3	Statistical Power0.4	1
Conditional Randomization Test	XOR interaction	Statistical Power1	1
Conditional Randomization Test	Threshold feature	Power1	1

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord