Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringNQ
Accuracy66.6
108
Hallucination DetectionNQ
AUC0.8645
102
Hallucination DetectionNQ (test)
AUC ROC95.2
84
Question AnsweringNQ (test)
EM Accuracy58.4
66
Question AnsweringNQ
EM79
57
CalibrationNQ
ECE0.046
55
Retrieval-Augmented Generation (RAG)NQ
Reliability Score (RS)54.33
52
Table Question AnsweringNQ-Table
F1 Score80.1
50
End-to-end Open-Domain Question AnsweringNQ (test)
Exact Match (EM)54
50
General Question AnsweringNQ
Exact Match (EM)54.8
36
Information RetrievalNQ320k
Hits@140.4
32
Question AnsweringNQ
Accuracy38
30
Passage RankingNQ
MRR52.76
29
Question AnsweringNQ
EM39.5
28
Prompt Injection PreventionNQ simplified
Naïve Success Rate41
24
Confidence Calibration in Retrieval-Augmented GenerationNQ k=5 OOD (test)
ECE0.248
24
Question AnsweringNQ-Open
Exact Match (EM)47.4
24
Retrieval-Augmented GenerationNQ
Accuracy77.1
23
Document RetrievalNQ 320k (test)
Hits@163.4
23
Document RetrievalNQ 100K
Hits@127.5
23
Document RetrievalNQ10K
Hits@148.5
23
Open-domain Question AnsweringNQ (test)
EM44.38
22
Question AnsweringNQ
Faith0.9083
21
Open-domain Question AnsweringNQ-Open
Accuracy29
20
Information RetrievalNQ BEIR
nDCG@1062.76
20
Showing 25 of 110 rows