Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-domain Question Answering on NQ single-hop (test)
Loading...
47.7
Avg@16
Dr. MAS
26.276
31.838
37.4
42.962
Feb 9, 2026
Avg@16
Accuracy
F1 Score
Tokens
Pass@16
Updated 3mo ago
Evaluation Results
Method
Method
Links
Avg@16
Accuracy
F1 Score
Tokens
Pass@16
Dr. MAS
Backbone=Qwen2.5-7B, A...
2026.02
47.7
-
-
-
59.5
Dr. MAS
Backbone=Qwen2.5-7B, A...
2026.02
47.4
-
-
-
60.7
GRPO
Backbone=Qwen2.5-7B, A...
2026.02
46.4
-
-
-
57.6
GRPO
Backbone=Qwen2.5-7B, A...
2026.02
45.2
-
-
-
60
Dr. MAS
Backbone=Qwen2.5-3B, A...
2026.02
44.6
-
-
-
58.1
Dr. MAS
Backbone=Qwen2.5-3B, A...
2026.02
43.8
-
-
-
58.5
GRPO
Backbone=Qwen2.5-3B, A...
2026.02
43.8
-
-
-
54.5
GRPO
Backbone=Qwen2.5-3B, A...
2026.02
41
-
-
-
59
GRPO
Backbone=Qwen2.5-3B, A...
2026.02
40.6
-
-
-
54.7
GRPO
Backbone=Qwen2.5-7B, A...
2026.02
27.1
-
-
-
39
Feedback
Search any
task
Search any
task