Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Natural Questions

Benchmarks

Task NameDataset NameSOTA ResultTrend
Open Question AnsweringNatural Questions (NQ) (test)
Exact Match (EM)58.4
134
Retrieval Attack DefenseNatural Questions (NQ)
ASR0
99
Inference EfficiencyNatural Questions (NQ)
Relative Overhead (%)0.019
90
Open Domain Question AnsweringNatural Questions (NQ)
Exact Match (EM)60.7
82
Question AnsweringNatural Questions (NQ) (test)
Exact Match76
77
Over-refusal EvaluationNQ (Natural Questions)
ORR0
72
Question AnsweringNatural Questions (test)
EM61.65
72
Question AnsweringNQ (Natural Questions)
EM78.3
70
RAG Attack DefenseNatural Questions
ASR0
63
RetrievalNatural Questions (test)
Top-5 Recall92.1
62
Question AnsweringNQ (Natural Questions) (test)
Accuracy68.6
60
Single-hop QANQ (Natural Questions)
EM72
52
Question AnsweringNatural Questions
EM70.58
52
Question AnsweringNatural Questions (NQ)
Accuracy49.3
48
WatermarkingNatural Questions (NQ) (test)
AUROC100
45
Question AnsweringNatural Questions (NQ) (test)
Robust Accuracy68
45
Knowledge EvaluationNatural Questions (NQ) (Evaluation)
Accuracy83
45
Passage retrievalNatural Questions (NQ) (test)
Top-20 Accuracy85.2
45
Embedding AlignmentNatural Questions (test)
Top-1 Accuracy100
40
Question AnsweringNatural Questions
Accuracy46.39
36
Open-QA EvaluationEVOUNA-NaturalQuestions
F1 Score97.9
35
Honesty AlignmentNatural Questions (NQ) In-Domain
AUROC85.16
33
Single-hop Question AnsweringNatural Questions (NQ) (test)
EM47.5
33
Open-Domain Question AnsweringNQ (Natural Questions)
EM51.4
33
Question AnsweringNatural Questions (NQ)
Exact Match (EM)45.68
32
Showing 25 of 122 rows