| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Classification | Credit | ROCAUC98.6 | 63 | |
| Numerical Reasoning | Credit | Hit Rate @ 174.5 | 24 | |
| Contamination detection | credit | Acomp0.81 | 24 | |
| Faithful Narrative Generation | credit | RA100 | 16 | |
| Counterfactual Explanations | Credit (test) | IM11.0767 | 16 | |
| Fairness Classification | Credit (test) | Disparate Impact (DP)0.1762 | 14 | |
| Fairness Evaluation | Credit (test) | PP5.66 | 14 | |
| Classification | credit (test) | EOpp0 | 14 | |
| Node Classification | Credit Dataset | BACC69.95 | 14 | |
| Data Imputation | Credit-g | Accuracy54.46 | 13 | |
| Graph Multiple Sensitive Attribute Inference Attack | Credit | AA (Attribute Accuracy)74.71 | 10 | |
| Counterfactual Generation | Credit | Runtime (minutes)0 | 9 | |
| Classification | credit (UCI) | Accuracy83.1 | 9 | |
| Node Classification | Credit r: 0.01 (test) | ACC (%)78.13 | 9 | |
| Classification | credit small | Accuracy99.39 | 9 | |
| Classification | Credit | Error Rate24.79 | 9 | |
| Actionable Counterfactual Generation | Credit 1994 (test) | Validity100 | 9 | |
| Tabular Classification | Credit | Clean Accuracy83.3 | 9 | |
| Multi-class classification | Credit | Macro F1 Score0.48 | 9 | |
| Classification | Credit | Individual Fairness Gap12.8 | 8 | |
| Strategic Classification | Credit | Post-manipulation Accuracy83.96 | 8 | |
| Node Classification | Credit (test) | Accuracy73.79 | 8 | |
| Full black box attack | Credit (test) | FBB ROC AUC0.71 | 8 | |
| Explanation Generation | Credit | PPL4.3 | 7 | |
| Classification | Credit age (test) | F1 Score83.47 | 7 |