Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Multi-domain Reasoning on AIME, MMLU-Pro, MedMCQA, and GPQA
Loading...
62.43
Average Score
SKILL-MOE
47.4644
51.3497
55.235
59.1203
Mar 7, 2025
Average Score
Updated 1d ago
Evaluation Results
Method
Method
Links
Average Score
SKILL-MOE
Category=Multi-Model M...
2025.03
62.43
Self-Consistency (SC)
Category=Advanced Sing...
2025.03
58.72
Zero-Shot CoT
Category=Open-Source 3...
2025.03
56.94
Zero-Shot CoT
Category=Open-Source 3...
2025.03
54.28
Self-MoA
Category=Single-Model...
2025.03
54.28
ReConcile
Category=Multi-Model M...
2025.03
53.8
Multi-Agent Debate
Category=Single-Model...
2025.03
53.76
MoA
Category=Multi-Model M...
2025.03
53.76
Zero-Shot CoT
Category=Open-Source 7...
2025.03
53.62
Zero-Shot CoT
Category=Open-Source 3...
2025.03
53.18
Self-Refine (SR)
Category=Advanced Sing...
2025.03
51.87
Zero-Shot CoT
Category=Open-Source 7...
2025.03
48.04
Feedback
Search any
task
Search any
task