| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Paraphrase Detection | MRPC | Avg Accuracy89.9 | 89 | |
| Sentence Similarity | MRPC (test) | F1 (micro)78.36 | 44 | |
| Paraphrase Detection | MRPC GLUE (val) | Accuracy85.54 | 27 | |
| Paraphrase Detection | MRPC | Delta Accuracy0 | 24 | |
| Paraphrase Identification | MRPC | Delta 1-4.73 | 24 | |
| Paraphrase Detection | MRPC | Accuracy90.43 | 14 | |
| Paraphrase Detection | MRPC | Spearman Correlation (x100)30.87 | 12 | |
| Ranking correlation with full dataset evaluation | MRPC | Kendall Correlation0.65 | 10 | |
| Classification | MRPC (test) | Macro F181.2 | 9 | |
| Paraphrase | NO-MRPC NLEBench (test) | Accuracy73.7 | 6 | |
| Classification | MRPC | Accuracy86.52 | 6 | |
| Paraphrase Detection | MRPC (val) | F1 Score93.8 | 6 | |
| Paraphrase Detection | MRPC (dev) | F1 Score91 | 6 | |
| Semantic Equivalence | MRPC | Success Rate70 | 5 | |
| Paraphrase Identification | MRPC | Accuracy76 | 5 | |
| Paraphrase Detection | MRPC GLUE (test) | F1 Score88 | 5 | |
| Natural Language Inference | MRPC | Accuracy0.736 | 5 | |
| Binary Classification | MRPC | AUC77.77 | 5 | |
| Paraphrase Detection | MRPC | Δ Accuracy-0.04 | 3 | |
| Indirect Prompt Injection Sanitization | MRPC | GCG ASR0.5 | 2 | |
| Indirect Prompt Injection Attack | MRPC | ASR98.5 | 2 | |
| Natural Language Understanding | MRPC (test) | Accuracy89.46 | 2 | |
| Text Classification | MRPC GLUE | Accuracy93.32 | 2 | |
| Paraphrase Identification | MRPC | F1 Score88.9 | 2 | |
| Indirect Prompt Injection Detection | MRPC | GCG Accuracy95 | 1 |