Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringNQ
Accuracy70
123
Hallucination DetectionNQ
AUC0.8645
102
Question AnsweringNQ (test)
AUROC83
90
Question AnsweringNQ
Absolute Execution Time Overhead (s)0.064
90
Question AnsweringNQ
PRR0.65
90
Question AnsweringNQ (test)
EM Accuracy66.4
86
Hallucination DetectionNQ (test)
AUC ROC95.2
84
RAG Performance PredictionNQ-Open
QE5 Score0.793
80
Question AnsweringNQ
ACE Score0.496
70
Question AnsweringNQ
ASR99.65
70
Question AnsweringNQ
EM79
69
Question AnsweringNQ
Accuracy87
63
CalibrationNQ
ECE0.046
55
General Question AnsweringNQ
Exact Match (EM)54.8
52
Retrieval-Augmented Generation (RAG)NQ
Reliability Score (RS)54.33
52
Table Question AnsweringNQ-Table
F1 Score80.1
50
End-to-end Open-Domain Question AnsweringNQ (test)
Exact Match (EM)54
50
Question AnsweringNQ
Exact Match72.57
46
Single-Hop Question AnsweringNQ
Exact Match (EM)51.7
44
Explaining LLMsNQ
CRR11.76
42
General QANQ
EM40.6
38
Question AnsweringNQ
NQ Recall (%)90.6
36
Information RetrievalNQ320k
Hits@140.4
32
Question AnsweringNQ-Open
Exact Match (EM)47.4
32
Question AnsweringNQ
F1 Score (NQ)78.8
31
Showing 25 of 158 rows