| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | RTE | Accuracy90 | 367 | |
| Text Classification | RTE | Accuracy84.4 | 78 | |
| Natural language inference | RTE (test) | Accuracy90.25 | 52 | |
| Natural Language Inference | RTE | Accuracy (0-shot)84.8 | 42 | |
| Recognizing Textual Entailment | RTE (test) | Accuracy76.53 | 26 | |
| Natural Language Inference | RTE (val) | Accuracy0.918 | 24 | |
| Recognizing Textual Entailment | RTE | Delta 126.24 | 24 | |
| Natural Language Inference | RTE | Avg Accuracy81.2 | 21 | |
| Recognizing Textual Entailment | RTE (Recognizing Textual Entailment) GLUE (val) | Accuracy66.06 | 18 | |
| Zero-shot Prediction | RTE | Zero-shot Accuracy (RTE)62.82 | 17 | |
| Recognizing Textual Entailment | RTE | Accuracy83.13 | 16 | |
| Natural Language Inference | RTE (dev) | Accuracy90.5 | 12 | |
| Recognizing Textual Entailment | RTE | Total Communication Time ($10^3$ s)4.29 | 9 | |
| Natural Language Inference | RTE GLUE (test dev) | Accuracy84 | 8 | |
| Natural Language Inference | RTE SuperGLUE (test) | Accuracy66.13 | 8 | |
| Natural Language Inference | RTE | F1 Score80.91 | 7 | |
| Natural language inference | RTE | Macro-F165.8 | 6 | |
| Recognizing Textual Entailment | RTE | F1 Macro92.1 | 5 | |
| Natural Language Inference | RTE | Delta Accuracy-0.03 | 3 | |
| Indirect Prompt Injection Sanitization | RTE | GCG ASR1 | 2 | |
| Indirect Prompt Injection Attack | RTE | Attack Success Rate93 | 2 | |
| Indirect Prompt Injection Detection | RTE | GCG Accuracy92.5 | 1 |