Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Model Evaluation on Aggregated 11-benchmark suite Math, Code, IF

74.9Average Accuracy

Qwen3-30B-A3B

34.96445.33255.766.068May 18, 2026May 19, 2026
Updated 13d ago

Evaluation Results

MethodLinks
2026.05
74.90---
2026.05
74.251.2---
2026.05
73.351.5---
2026.05
7350---
2026.05
72.50---
2026.05
72.350---
2026.05
71.853---
2026.05
70.950---
2026.05
70.952.8---
2026.05
70.650---
2026.05
68.143.8---
2026.05
67.837.5---
2026.05
57.147---
2026.05
54.851.9---
2026.05
50-55.543.141.8
2026.05
49.2-544441.6
2026.05
46.1-52.938.834.7
2026.05
45.4-52.937.633.4
2026.05
43.7-50.637.231.8
2026.05
37.7-34.937.244.6
2026.05
36.5-34.136.342