Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionNQ
AUC0.889
154
Question AnsweringNQ (test)
EM Accuracy66.4
133
Question AnsweringNQ
Accuracy70
123
Question AnsweringNQ
Accuracy87
113
Question AnsweringNQ
Exact Match74.2
101
Hallucination DetectionNQ (test)
AUC ROC95.2
91
Question AnsweringNQ (test)
AUROC83
90
Question AnsweringNQ
Absolute Execution Time Overhead (s)0.064
90
Question AnsweringNQ
PRR0.65
90
RAG Performance PredictionNQ-Open
QE5 Score0.793
80
Open-Domain Question-AnsweringNQ
Accuracy61.6
74
Question AnsweringNQ
ACE Score0.496
70
Question AnsweringNQ
ASR99.65
70
Question AnsweringNQ
EM79
69
Question AnsweringNQ
F1 Score (NQ)78.8
64
Table Question AnsweringNQ-Table
F1 Score80.1
63
Single-Hop Question AnsweringNQ
Exact Match (EM)51.7
60
End-to-end Open-Domain Question AnsweringNQ (test)
Exact Match (EM)55.1
59
Question AnsweringNQ
F1 Score44.11
56
CalibrationNQ
ECE0.046
55
General QANQ
EM46.9
54
Information RetrievalNQ320k
Hits@148.92
54
General Question AnsweringNQ
Exact Match (EM)54.8
52
Retrieval-Augmented Generation (RAG)NQ
Reliability Score (RS)54.33
52
Retrieval-Augmented Question AnsweringNQ
Clean Accuracy89
45
Showing 25 of 227 rows
...