Valid Feature-Level Inference for Tabular Foundation Models via the Conditional Randomization Test
About
Modern machine learning models are highly expressive but notoriously difficult to analyze statistically. In particular, while black-box predictors can achieve strong empirical performance, they rarely provide valid hypothesis tests or p-values for assessing whether individual features contain information about a target variable. This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data. The resulting procedure yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings, without requiring model retraining or parametric assumptions.
Mohamed Salem• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Conditional Randomization Test | Linear sparse | Statistical Power1 | 1 | |
| Conditional Randomization Test | Linear dense | Power1 | 1 | |
| Conditional Randomization Test | Weak signal | Statistical Power1 | 1 | |
| Conditional Randomization Test | Noise block | Power (%)100 | 1 | |
| Conditional Randomization Test | Correlated linear | Power1 | 1 | |
| Conditional Randomization Test | Friedman 1 | Statistical Power1 | 1 | |
| Conditional Randomization Test | Friedman 2 | Statistical Power0.6 | 1 | |
| Conditional Randomization Test | Friedman 3 | Statistical Power0.4 | 1 | |
| Conditional Randomization Test | XOR interaction | Statistical Power1 | 1 | |
| Conditional Randomization Test | Threshold feature | Power1 | 1 |
Showing 10 of 11 rows