| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LF Mislabeling Identification | Spambase | AP92.7 | 32 | |
| End model evaluation | Spambase | Test Loss0.258 | 22 | |
| Outlier Detection | SpamBase ADBench | AUROC66.2 | 17 | |
| Classification | spambase | Accuracy93.5 | 15 | |
| Outlier Detection | SpamBase (Group I) | AUROC66.2 | 14 | |
| Classification | Spambase (test) | Test Loss0.283 | 13 | |
| Outlier Detection | SpamBase | AUC0.9021 | 11 | |
| Outlier Detection | SpamBase | AP86.31 | 11 | |
| Anomaly Detection | SpamBase | AUPRC89.24 | 10 | |
| Anomaly Detection | SpamBase Out-of-Domain | F1 Score81.8 | 10 | |
| Spam Classification | Spambase 0% missingness (test) | AUC98.73 | 10 | |
| Hierarchical Clustering | Spambase | Dasgupta's Cost34,261,369.825 | 10 | |
| Hierarchical Clustering | Spambase | DP75.5 | 10 | |
| Classification | Spambase | F1 Score93.92 | 9 | |
| CASH | spambase (test) | Test Error0.0591 | 9 | |
| Private Decision Tree Evaluation | spambase | Online Running Time31.2 | 8 | |
| Classification | Spambase (5-fold cross-val) | Accuracy92.23 | 7 | |
| Abductive Explanation Generation | Spambase (test) | Average Execution Time (ms)2.92 | 6 | |
| Mixture Proportion Estimation | UCI spambase (test) | Absolute Error0.006 | 6 | |
| PvN classification | UCI Spambase (test) | Accuracy89.4 | 6 | |
| Active Learning Classification | Spambase | F1 Score78.5 | 5 | |
| Decision Tree Evaluation | spambase | Overall Latency (s)19,700 | 4 | |
| Binary Classification | Spambase (test) | Macro F1 Score91.7 | 4 | |
| Abductive Explanation Generation | Spambase Rejected | Avg Explanation Size48.89 | 3 | |
| Abductive Explanation Generation | Spambase | Avg Explanation Size1.24 | 3 |