Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RTE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language InferenceRTE
Accuracy90
367
Text ClassificationRTE
Accuracy84.4
78
Natural language inferenceRTE (test)
Accuracy90.25
52
Natural Language InferenceRTE
Accuracy (0-shot)84.8
42
Recognizing Textual EntailmentRTE (test)
Accuracy76.53
26
Natural Language InferenceRTE (val)
Accuracy0.918
24
Recognizing Textual EntailmentRTE
Delta 126.24
24
Natural Language InferenceRTE
Avg Accuracy81.2
21
Recognizing Textual EntailmentRTE (Recognizing Textual Entailment) GLUE (val)
Accuracy66.06
18
Zero-shot PredictionRTE
Zero-shot Accuracy (RTE)62.82
17
Recognizing Textual EntailmentRTE
Accuracy83.13
16
Natural Language InferenceRTE (dev)
Accuracy90.5
12
Recognizing Textual EntailmentRTE
Total Communication Time ($10^3$ s)4.29
9
Natural Language InferenceRTE GLUE (test dev)
Accuracy84
8
Natural Language InferenceRTE SuperGLUE (test)
Accuracy66.13
8
Natural Language InferenceRTE
F1 Score80.91
7
Natural language inferenceRTE
Macro-F165.8
6
Recognizing Textual EntailmentRTE
F1 Macro92.1
5
Natural Language InferenceRTE
Delta Accuracy-0.03
3
Indirect Prompt Injection SanitizationRTE
GCG ASR1
2
Indirect Prompt Injection AttackRTE
Attack Success Rate93
2
Indirect Prompt Injection DetectionRTE
GCG Accuracy92.5
1
Showing 22 of 22 rows