Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Knowledge-intensive reasoning benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Knowledge-intensive reasoning
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
HLE
R1-Searcher
Avg Score
85
75
14d ago
MuSiQue
Llama3.1-8B + ARPO
F1 Score
34.8
43
22h ago
HotpotQA
Llama3.1-8B + ARPO
F1 Score
0.654
41
22h ago
SuperGPQA
Qwen3.5
Overall Score
63.4
31
14d ago
Musique
R1-Searcher
Accuracy
87
31
3mo ago
Knowledge-Intensive Reasoning Suite 2Wiki., Bamb., HQA, MuSi., SimQA
Qwen2.5-7B + SFT-then-RL
2Wiki Score
58.4
25
1mo ago
Bamboogle
Llama3.1-8B + ARPO
F1
73.8
23
22h ago
2wikiMultiHopQA
Qwen2.5-7B + GRPO
F1 Score
76.1
18
3mo ago
WebWalker
Llama3.1-8B + ARPO
F1 Score
30.5
18
3mo ago
2WikiMultiHopQA
AutoTool (Qwen3-8B)
Accuracy
48.8
18
3mo ago
HQA
AutoTraj
Average Score
87
18
3mo ago
Bamboogle
EAPO
F1 Score
60.4
15
22h ago
2WikiMultihopQA
EAPO
F1 Score
58.6
15
22h ago
GPQA
CPPO
Result Score
38.89
14
1mo ago
GPQA ambiguity-augmented
DisambiguSLM
Accuracy
42.8
11
1mo ago
2Wiki
AutoTraj
Average Score
0.89
9
3mo ago
C-Eval
Qwen3.5
Score
90.2
7
20d ago
MMLU-CF first 1,000 samples (test)
MGRS
Exact Match Accuracy
74.2
7
3mo ago
Knowledge-intensive reasoning suite (HotpotQA, 2WikiMultihopQA, Musique)
TEPOdense
HotpotQA Score
43.6
6
3mo ago
2Wiki
EAPO
F1 Score
52
5
22h ago
Generalization Verification
KDCM + Code Module
Hits@1
99.18
5
3mo ago
Showing 21 of 21 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs