Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CoQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionCoQA
Mean AUROC0.8584
48
Hallucination DetectionCoQA
AUCs77.5
42
Uncertainty estimationCoQA (test)
AUROC77.3
42
Question AnsweringCoQA alpha = 0.25 (test)
Empirical Error Rate (EER)0.2347
40
Question AnsweringCoQA alpha = 0.25 (filtering stage)
EER23.47
40
Language GenerationCoQA
Accuracy65.5
35
Conversational Question AnsweringCOQA zero-shot (test)
Exact Match (EM)70.85
32
Conversational Question AnsweringCoQA
Accuracy75.9
29
Question AnsweringCOQA
Factual Accuracy28.27
21
Conversational Question AnsweringCoQA official (test)
Overall F188.8
17
Question AnsweringCoQA
PR-AUC60
16
Conversational Question AnsweringCoQA (dev)
Overall F10.849
14
Conversational Question AnsweringCOQA
AIBC86.5
12
Noisy-RAG Question AnsweringCoQA
Exact Match (EM)92.4
11
Conversational Question AnsweringCoQA
F1 Score62.65
10
Answer span extractionCoQA (val)
EM63.65
9
Question GenerationCoQA (val)
Distinct-168.35
9
Answer-unaware Conversational Question GenerationCoQA (dev)
Distinct-184.09
9
Conversational Question AnsweringCoQA
EM60.3
8
Question AnsweringCoQA zero-shot (test)
F1 Score73
6
Question AnsweringCoQA (val test)
F173
6
Reading ComprehensionCoQA (dev)
F1 Score85
6
Conversational Question AnsweringCoQA without human rewrites v1.0 (test)
Overall F183.4
6
Dialogue GenerationCoQA CNN
BLEU15.11
5
Dialogue GenerationCoQA (MCTest)
BLEU26.3
5
Showing 25 of 31 rows