Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Language Capability on MMLU, GSM8K, and GPQA
Loading...
73.6
MMLU Accuracy
MUSE-D
73.288
73.369
73.45
73.531
May 9, 2026
MMLU Accuracy
GSM8K Accuracy
GPQA Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
GSM8K Accuracy
GPQA Accuracy
MUSE-D
Backbone=Qwen2.5-7B-IT
2026.05
73.6
82.5
32.8
SafeMT
Backbone=Qwen2.5-7B-IT
2026.05
73.5
81
32.6
TRACE
Backbone=Qwen2.5-7B-IT...
2026.05
73.5
83.5
32
TRACE
Backbone=Qwen2.5-7B-IT...
2026.05
73.5
84.5
35.6
Qwen2.5-7B-IT
2026.05
73.3
86.5
34.2
Feedback
Search any
task
Search any
task