Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA General Question Answering benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
General Question Answering
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
TriviaQA
Agentic-R
Exact Match
69.02
54
8d ago
NQ
SkillOrchestra+
Exact Match (EM)
54.8
52
2d ago
PopQA
InForage-instruct
EM
45.2
51
8d ago
General QA NQ, TriviaQA, PopQA
Search-R1-GRPO + LLDS
NQ Accuracy
51.8
34
1mo ago
TriviaQA (test val)
DeepControl
EM
68.2
24
1mo ago
Natural Questions (NQ) (test val)
DeepControl
EM
55.8
24
1mo ago
PopQA
SkillOrchestra+
Accuracy
48.8
18
1mo ago
TriviaQA
SkillOrchestra+
Accuracy
80.2
18
1mo ago
TriviaQA
Search-R1
EM
64.4
18
1mo ago
NQ (Natural Questions)
Search-R1
EM
46.1
18
1mo ago
TriviaQA
π-Play
Score
64.6
16
2d ago
PopQA out-of-domain (val test)
Search-R2
Exact Match (EM)
50.1
15
1mo ago
TriviaQA out-of-domain (val test)
Search-R2
EM
70.9
15
1mo ago
NQ (Natural Questions) in-domain (val/test)
Search-R2
Exact Match
50.9
15
1mo ago
OOD Non-Math GPQA, CommonsenseQA
Baseline LLM
Pass@1
71.4
12
1mo ago
TriviaQA (test)
Search-R1 + EKA
F1
66.1
11
1mo ago
PopQA (test)
Workflow-R1-Search
EM
49.3
10
1mo ago
TriviaQA (test)
Workflow-R1-Search
EM
73.3
10
1mo ago
ExpertQA
DAPO+START
Reward
0.2385
8
25d ago
TriviaQA
Pioneer Agent
Accuracy
76.1
4
4d ago
PopQA (test val)
DeepControl
Exact Match (EM)
52.1
4
1mo ago
MMLU-Pro (test)
GEPA
Mean Accuracy
79.55
4
1mo ago
MMLU-Pro (test)
ETGPO
Optimization Token Usage
595
3
1mo ago
UltraFeedback
GRPO+START
Reward
0.1695
2
25d ago
Showing 24 of 24 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs