Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PAWS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Semantic Hallucination DetectionPAWS
AUROC91.24
36
Paraphrase DetectionPAWS
Accuracy94.6
24
Paraphrase DetectionPAWS original (test)
Accuracy91.79
23
Paraphrase IdentificationPAWS -> PAWS (test)
Accuracy94.1
22
Paraphrase IdentificationPAWS -> QQP (test)
AUROC77.7
20
Paraphrase IdentificationPAWS
Accuracy97.6
17
Paraphrase DetectionPAWS-QQP
Accuracy96
16
Paraphrase DetectionPAWS Wiki
Accuracy86.8
12
Paraphrase IdentificationPAWS Out-of-distribution from PIT
Macro F155.1
10
Paraphrase IdentificationPAWS -> WMT (test)
AUROC0.849
10
Ranking correlation with full dataset evaluationPAWS Wiki
Kendall Correlation0.96
10
Performance PredictionPAWS
MAE0.8
9
Zero-shot performance predictionPAWS
MAE1.92
9
Ranking correlation with full dataset evaluationPAWS QQP
Kendall Correlation0.87
7
Showing 14 of 14 rows