Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RTE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language InferenceRTE
Accuracy93.5
448
Text ClassificationRTE
Accuracy84.4
96
Natural language inferenceRTE (test)
Accuracy90.25
52
Recognizing Textual EntailmentRTE
Accuracy83.13
47
Natural Language InferenceRTE
Accuracy (0-shot)84.8
42
Recognizing Textual EntailmentRTE (test)
Accuracy76.53
26
Natural Language InferenceRTE (val)
Accuracy0.918
24
Recognizing Textual EntailmentRTE
Delta 126.24
24
Natural Language InferenceRTE
Avg Accuracy81.2
21
Recognizing Textual EntailmentRTE (Recognizing Textual Entailment) GLUE (val)
Accuracy66.06
18
Zero-shot PredictionRTE
Zero-shot Accuracy (RTE)62.82
17
Natural Language InferenceRTE
Normalized Accuracy94.4
13
Natural Language InferenceRTE (dev)
Accuracy90.5
12
Recognizing Textual EntailmentRTE
Total Communication Time ($10^3$ s)4.29
9
Recognizing Textual EntailmentRTE
Repair Accuracy100
8
Natural Language InferenceRTE GLUE (test dev)
Accuracy84
8
Natural Language InferenceRTE SuperGLUE (test)
Accuracy66.13
8
Natural Language InferenceRTE
F1 Score80.91
7
Natural language inferenceRTE
Macro-F165.8
6
Recognizing Textual EntailmentRTE
F1 Macro92.1
5
Natural Language InferenceRTE
Accuracy (RTE)61
3
Natural Language InferenceRTE
Delta Accuracy-0.03
3
Recognizing Textual EntailmentRTE
Accuracy79.06
2
Indirect Prompt Injection SanitizationRTE
GCG ASR1
2
Indirect Prompt Injection AttackRTE
Attack Success Rate93
2
Showing 25 of 26 rows