Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multitask Reasoning on MMLU-Pro (Accuracy)
Loading...
61.1
Accuracy (MMLU-Pro)
SPIRAL
29.588
37.769
45.95
54.131
Jun 30, 2025
Accuracy (MMLU-Pro)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (MMLU-Pro)
SPIRAL
Backbone=Qwen3-8B, Tra...
2025.06
61.1
SPIRAL
Backbone=DeepSeek-Dist...
2025.06
58.9
SFT
Backbone=Qwen3-8B, Tra...
2025.06
58.8
SPIRAL
Backbone=Qwen3-4B, Tra...
2025.06
58.5
SPIRAL
Backbone=Qwen3-4B, Tra...
2025.06
57.7
DeepSeek-Distill-Qwen-7B
Backbone=DeepSeek-Dist...
2025.06
57.1
Qwen3-8B-Base
Backbone=Qwen3-8B, Eva...
2025.06
55.7
SFT
Backbone=DeepSeek-Dist...
2025.06
55.6
SFT
Backbone=Qwen3-4B, Tra...
2025.06
51.3
SPIRAL
Backbone=Llama-3.1-8B-...
2025.06
50.4
SPIRAL
Backbone=Octothinker-8...
2025.06
49.3
Llama-3.1-8B-Instruct
Backbone=Llama-3.1-8B-...
2025.06
49.1
SFT
Backbone=Llama-3.1-8B-...
2025.06
48.9
SFT
Backbone=Qwen3-4B, Tra...
2025.06
48.8
Qwen3-4B-Base
Backbone=Qwen3-4B, Eva...
2025.06
47.2
SFT
Backbone=Octothinker-8...
2025.06
39.1
Octothinker-8B-Base
Backbone=Octothinker-8B
2025.06
30.8
Feedback
Search any
task
Search any
task