Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Natural Questions

Benchmarks

Task NameDataset NameSOTA ResultTrend
Open Question AnsweringNatural Questions (NQ) (test)
Exact Match (EM)58.4
134
Retrieval Attack DefenseNatural Questions (NQ)
ASR0
99
Inference EfficiencyNatural Questions (NQ)
Relative Overhead (%)0.019
90
Open Domain Question AnsweringNatural Questions (NQ)
Exact Match (EM)60.7
74
Over-refusal EvaluationNQ (Natural Questions)
ORR0
72
Question AnsweringNatural Questions (test)
EM61.65
72
Question AnsweringNQ (Natural Questions)
EM78.3
70
Question AnsweringNatural Questions (NQ) (test)
Exact Match76
68
RetrievalNatural Questions (test)
Top-5 Recall92.1
62
Question AnsweringNQ (Natural Questions) (test)
Accuracy68.6
60
Question AnsweringNatural Questions
EM70.58
52
Question AnsweringNatural Questions (NQ)
Accuracy49.3
48
Question AnsweringNatural Questions (NQ) (test)
Robust Accuracy68
45
Knowledge EvaluationNatural Questions (NQ) (Evaluation)
Accuracy83
45
Passage retrievalNatural Questions (NQ) (test)
Top-20 Accuracy85.2
45
Embedding AlignmentNatural Questions (test)
Top-1 Accuracy100
40
Open-QA EvaluationEVOUNA-NaturalQuestions
F1 Score97.9
35
Honesty AlignmentNatural Questions (NQ) In-Domain
AUROC85.16
33
Single-hop Question AnsweringNatural Questions (NQ) (test)
EM47.5
33
Open-Domain Question AnsweringNQ (Natural Questions)
EM51.4
33
Question AnsweringNQ (Natural Questions)
EM42.5
28
Passage RetrievalNatural Questions (NQ)
Top-10 Accuracy66.59
28
Closed-book Question AnsweringNatural Questions (test)
Accuracy29.9
27
Question AnsweringNatural Questions (test)
Speedup Ratio2.916
26
Information RetrievalNatural Questions (test)
Recall@2086.1
25
Showing 25 of 103 rows