Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PAWS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Semantic Hallucination DetectionPAWS
AUROC91.24
36
Paraphrase DetectionPAWS
Accuracy94.6
24
Paraphrase DetectionPAWS original (test)
Accuracy91.79
23
Paraphrase IdentificationPAWS -> PAWS (test)
Accuracy94.1
22
Paraphrase IdentificationPAWS -> QQP (test)
AUROC77.7
20
Paraphrase IdentificationPAWS
Accuracy97.6
17
Paraphrase DetectionPAWS-QQP
Accuracy96
16
Paraphrase IdentificationPAWS Out-of-distribution from PIT
Macro F155.1
10
Paraphrase IdentificationPAWS -> WMT (test)
AUROC0.849
10
Ranking correlation with full dataset evaluationPAWS Wiki
Kendall Correlation0.96
10
Performance PredictionPAWS
MAE0.8
9
Zero-shot performance predictionPAWS
MAE1.92
9
Paraphrase DetectionPAWS Wiki
Accuracy47.5
8
Ranking correlation with full dataset evaluationPAWS QQP
Kendall Correlation0.87
7
Showing 14 of 14 rows