Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multitask Language Understanding on MMLU-pro (RA)
Loading...
59.33
RA
CortexDebate
45.4668
49.0659
52.665
56.2641
Jul 5, 2025
RA
Updated 4d ago
Evaluation Results
Method
Method
Links
RA
CortexDebate
Type=Ours
2025.07
59.33
MDM
Type=Part Debate
2025.07
58.9
PRD
Type=Full Debate
2025.07
54.2
PRD
Type=Full Debate
2025.07
54
RECONCILE
Type=Full Debate
2025.07
53.67
RECONCILE
Type=Full Debate
2025.07
53.1
GD
Type=Part Debate
2025.07
51.67
GD
Type=Part Debate
2025.07
51.3
ChatEval
Type=Full Debate
2025.07
49.33
ChatEval
Type=Full Debate
2025.07
49.3
ND
Type=Part Debate
2025.07
49.1
ND
Type=Part Debate
2025.07
48.67
MLD
Type=Full Debate
2025.07
48.4
MLD
Type=Full Debate
2025.07
47.33
MaV
Type=No Debate
2025.07
46.3
MaV
Type=No Debate
2025.07
46
Feedback
Search any
task
Search any
task