Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionCoQA
AUROC84.92
108
Hallucination DetectionCoQA
Mean AUROC0.8584
107
Selective GenerationCoQA
ROC-AUC74.7
66
Question AnsweringCoQA
CACC76.31
64
Uncertainty EstimationCoQA
AUROC0.857
58
Question AnsweringCoQA
PRR0.423
44
Hallucination DetectionCoQA
AUCs77.5
42
Uncertainty estimationCoQA (test)
AUROC77.3
42
Question AnsweringCoQA alpha = 0.25 (test)
Empirical Error Rate (EER)0.2347
40
Question AnsweringCoQA alpha = 0.25 (filtering stage)
EER23.47
40
Hallucination DetectionCoQA
AUROC91.74
39
Language GenerationCoQA
Accuracy65.5
35
Conversational Question AnsweringCOQA zero-shot (test)
Exact Match (EM)70.85
32
Conversational Question AnsweringCoQA
Accuracy75.9
29
Question AnsweringCoQA
F1 Score76
28
Conversational Question AnsweringCoQA
PRR40.7
22
Free-form text generationCoQA
Accuracy94.61
22
Question AnsweringCOQA
Factual Accuracy28.27
21
Hallucination detectionCoQA
AUROC0.98
20
Selective PredictionCoQA
PRR80.6
20
Hallucination DetectionCoQA
AUPRC89.01
20
Conversational Question AnsweringCoQA official (test)
Overall F188.8
17
Poisoned Sample DetectionCoQA (IID)
Recall100
16
Poisoned sample detectionCoQA (NIID-1)
Recall100
16
Question AnsweringCoQA
PR-AUC60
16
Showing 25 of 59 rows