Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NQ-Open

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination detectionNQ-Open
AUROC0.8843
61
Question AnsweringNQ-Open (val)
Accuracy30.7
28
Question AnsweringNQ-Open In-Domain (test)
Precision58.13
26
Factual Question AnsweringNQ-Open ID
Precision57.34
24
Question AnsweringNQ-open v1.0 (test)
A179.08
16
Hallucination DetectionNQ Open (test)
AUROC89.4
14
Question AnsweringNQ-Open (out-of-domain)
Precision0.705
12
Showing 7 of 7 rows