Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Question Answering on PopQA (test)
Loading...
49.3
EM
Workflow-R1-Search
11.86
21.58
31.3
41.02
Feb 1, 2026
EM
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
Workflow-R1-Search
Multi-turn=true, Backb...
2026.02
49.3
Search-R1 (GRPO)
Multi-turn=true, Backb...
2026.02
42.7
Search-R1 (PPO)
Multi-turn=true, Backb...
2026.02
39.7
Workflow-R1
Multi-turn=true, Backb...
2026.02
24.6
AFlow
Multi-turn=false, Back...
2026.02
23.2
SC (CoT×5)
Multi-turn=false, Back...
2026.02
21.4
MedPrompt
Multi-turn=false, Back...
2026.02
21.1
CoT
Multi-turn=false, Back...
2026.02
20.5
MaAS
Multi-turn=false, Back...
2026.02
19.3
Direct
Multi-turn=false, Back...
2026.02
13.3
Feedback
Search any
task
Search any
task