| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Classification | Titanic | Accuracy78.5 | 28 | |
| Regression | Titanic | Standard Deviation0 | 20 | |
| Classification | titanic (val) | AUROC0.883 | 12 | |
| Classification | Titanic (test) | Accuracy80.5 | 9 | |
| Classification | Titanic | AUC0.8736 | 8 | |
| Classification | titanic 25% feature noise | Accuracy78.9 | 7 | |
| Classification | titanic | F1-score85.2 | 7 | |
| Classification | titanic 0% noise (test) | Accuracy79.1 | 7 | |
| Classification | titanic (0% noise) | F1 Score85.6 | 7 | |
| Classification | titanic 25% feature noise | F1-score86.2 | 7 | |
| Classification | titanic 25% label noise | F1 Score85.9 | 7 | |
| Regression | Titanic | Average Relative Absolute Error38 | 6 | |
| Faithfulness under retraining | Titanic | AURC13.988 | 5 | |
| Binary Classification | Titanic (test) | Macro F1-score77.6 | 5 | |
| domain-specific question answering | titanic | Accuracy75.51 | 5 | |
| Classification | Titanic (out-of-sample) | Median AUC0.8736 | 5 | |
| Instance attribution explanation | Titanic (test) | Wall-clock Time (seconds)0.03 | 4 | |
| Target sensitivity estimation | Titanic (test) | Pearson r1 | 3 | |
| Binary Classification | TITANIC | R50013 | 3 | |
| p-robustness Estimation | TITANIC | R50044 | 3 | |
| Lexical Coverage Analysis | Titanic | Coverage95 | 2 | |
| Data Contamination Detection | Titanic | Metric- | 0 | |
| Fairness-aware classification | titanic (test) | Metric- | 0 |