Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Out-of-Distribution Generalization on MMLU-Redux and GPQA-Diamond
Loading...
76.7
Average Accuracy
Qwen3-30B-A3B
72.748
73.774
74.8
75.826
May 18, 2026
Average Accuracy
Average rZE Score
MMLU Accuracy
GPQA Accuracy
Updated 15d ago
Evaluation Results
Method
Method
Links
Average Accuracy
Average rZE Score
MMLU Accuracy
GPQA Accuracy
Qwen3-30B-A3B
Base Model=Qwen3-30B-A...
2026.05
76.7
0
90.1
63.3
ZEDA
Base Model=Qwen3-30B-A...
2026.05
76.2
47.2
89.2
63.2
GLM-4.7-Flash
Base Model=GLM-4.7-Fla...
2026.05
76.1
0
89.8
62.4
ZEDA
Base Model=GLM-4.7-Fla...
2026.05
72.9
50
89
56.8
Feedback
Search any
task
Search any
task