| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Treatment Effect Estimation | NEWS semi-synthetic | Mean Error0 | 22 | |
| Treatment Effect Estimation | NEWS semi-synthetic (test) | MSE0 | 22 | |
| Summarization | news multi | Rouge-L23.66 | 21 | |
| Named Entity Recognition | NEWS | F1 Score86.15 | 21 | |
| English-German document-level translation | News English-German (test) | s-BLEU30.34 | 20 | |
| News Recommendation | NEWS (test) | AUC64.68 | 18 | |
| Out-of-Distribution Detection | News (test) | AUROC80.7 | 17 | |
| Out-of-Distribution Detection | News | FPR69.31 | 17 | |
| Regression | News (test) | MSE0.69 | 17 | |
| LLM Unlearning | NEWS | Verification Memory (VerMem)22.09 | 16 | |
| Individual Treatment Effect (ITE) Estimation | NEWS (out) | PEHE0.44 | 16 | |
| Individual Treatment Effect (ITE) Estimation | NEWS (in) | PEHE0.25 | 16 | |
| ATE estimation | News | Joint Bias (JB)0.07 | 14 | |
| Machine Text Detection | News | Claude 3.5 Rewrite AUC1 | 11 | |
| Named Entity Recognition | News (test) | F1 Score80.86 | 10 | |
| Retrieval Question Answering | News in-domain | MRR46.6 | 10 | |
| Classification | News | Macro Precision93.87 | 9 | |
| Classification | News (test) | Average Inference Time (s)0.5233 | 9 | |
| Hierarchical Agglomerative Clustering | news | AMI0.627 | 9 | |
| Clustering | news | ARI47 | 9 | |
| Scientific Text Simplification | News | d-BLEU4.61 | 9 | |
| Text Classification | News FTC-metadataset mini 10% | AUROC99.07 | 8 | |
| Text Classification | News FTC-metadataset full | AURAC0.9837 | 8 | |
| Text Classification | News FTC-metadataset full | NLL0.1423 | 8 | |
| Text Classification | News FTC-metadataset full | Average Prediction Set Size1.2228 | 8 |