Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multiple-choice Reasoning on GPQA full dataset
Loading...
66.29
Accuracy
Meta-Debate
43.774
49.6195
55.465
61.3105
Jan 23, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Meta-Debate
Role assignment strate...
2026.01
66.29
Random assignment (Run 3)
Debate method=DMAD
2026.01
60.27
Meta-Debate
Role assignment strate...
2026.01
59.15
Claude for all roles
Debate method=DMAD
2026.01
58.93
Random assignment (Run 2)
Debate method=DMAD
2026.01
58.26
Random assignment (Run 3)
Debate method=MAD
2026.01
55.58
Nova for all roles
Debate method=DMAD
2026.01
54.46
Claude for all roles
Debate method=MAD
2026.01
54.24
Random assignment (Run 1)
Debate method=DMAD
2026.01
53.57
Nova for all roles
Debate method=MAD
2026.01
52.46
Random assignment (Run 2)
Debate method=MAD
2026.01
52.23
Random assignment (Run 1)
Debate method=MAD
2026.01
50.67
Pixtral for all roles
Debate method=DMAD
2026.01
50.45
Pixtral for all roles
Debate method=MAD
2026.01
44.64
Feedback
Search any
task
Search any
task