Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NQ-Open

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination detectionNQ-Open
AUROC0.8843
63
Question AnsweringNQ-Open (val)
Accuracy49.62
46
Question AnsweringNQ-Open In-Domain (test)
Precision58.13
26
Factual Question AnsweringNQ-Open ID
Precision57.34
24
Question AnsweringNQ-open Augmented (full-slice)
Restate-hard85.42
18
Question AnsweringNQ-open v1.0 (test)
A179.08
16
Question AnsweringNQ-Open Out-of-distribution (test)
Accuracy49.93
15
Hallucination DetectionNQ Open (test)
AUROC89.4
14
Question AnsweringNQ-Open (out-of-domain)
Precision0.705
12
Question AnsweringNQ-Open (test)
Mean F1 Score25.61
10
Open-domain Question AnsweringNQ-Open OOD (test)
Exact Match (EM)82.81
9
Showing 11 of 11 rows