Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Valid Feature-Level Inference for Tabular Foundation Models via the Conditional Randomization Test

About

Modern machine learning models are highly expressive but notoriously difficult to analyze statistically. In particular, while black-box predictors can achieve strong empirical performance, they rarely provide valid hypothesis tests or p-values for assessing whether individual features contain information about a target variable. This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data. The resulting procedure yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings, without requiring model retraining or parametric assumptions.

Mohamed Salem• 2026

Related benchmarks

TaskDatasetResultRank
Conditional Randomization TestLinear sparse
Statistical Power1
1
Conditional Randomization TestLinear dense
Power1
1
Conditional Randomization TestWeak signal
Statistical Power1
1
Conditional Randomization TestNoise block
Power (%)100
1
Conditional Randomization TestCorrelated linear
Power1
1
Conditional Randomization TestFriedman 1
Statistical Power1
1
Conditional Randomization TestFriedman 2
Statistical Power0.6
1
Conditional Randomization TestFriedman 3
Statistical Power0.4
1
Conditional Randomization TestXOR interaction
Statistical Power1
1
Conditional Randomization TestThreshold feature
Power1
1
Showing 10 of 11 rows

Other info

Follow for update