Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Trajectory-Based Difficulty Scoring for Reliable Learning on Tabular Data

About

Gradient-boosted trees achieve strong performance on tabular data, yet often leave a long tail of poorly predicted instances. We introduce a Trajectory-based Difficulty Score (TDS), an instance-level difficulty estimator for boosted ensembles derived from per-tree cumulative prediction trajectories. For each instance, we compute interpretable trajectory descriptors (e.g., variance, oscillation peaks, sign switches, and tail stability) and train a lightweight regression model to predict held-out loss. An empirical CDF calibrates the resulting signal into a score in $[0,1]$ that supports ranking hard cases. Across diverse tabular benchmarks and ensemble sizes, TDS exhibits strong rank correlation with error and outperforms established instance-hardness and uncertainty baselines on classification, while remaining competitive on regression. We then show how a single difficulty signal improves multiple data mining workflows: difficulty-driven active learning for label-efficient training, difficulty-thresholded selective prediction for improved risk-coverage trade-offs, and TDS-stratified (Mondrian) conformal prediction for more uniform conditional coverage. Finally, clustering high-TDS instances using SHAP attributions reveals coherent failure modes characterized by compact feature-value ranges, supporting error analysis and targeted data acquisition.

Tomer Lavi, Bracha Shapira, Nadav Rappoport• 2026

Related benchmarks

TaskDatasetResultRank
Active LearningMultiple Datasets (test)
AULC58.78
33
RegressionMean across regression datasets (val)
RMSE11.081
33
RegressionMean across regression datasets (test)
AULC13.899
33
Active LearningMultiple Datasets (val)
LL58.24
33
RegressionMultiple Datasets
Pearson r0.239
15
RegressionAverage of Regression Datasets (Adult, WiDS, Bike Sharing, Cal. Housing) (test)
Coverage90.5
12
Selective PredictionClassification Datasets Average (test)
NAURC71.6
12
ClassificationMultiple Datasets
Pearson r0.375
12
Selective PredictionRegression Datasets Average (test)
NAURC0.486
9
ClassificationAverage of Classification Datasets (Adult, WiDS, Bike Sharing, Cal. Housing) (test)
Covariance0.904
9
Showing 10 of 10 rows

Other info

Follow for update