Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on CodaSet ID GSM8k (test)

0.964Accuracy

Qwen3-235B-A22B

0.513680.630590.74750.86441May 25, 2026
Updated 8d ago

Evaluation Results

MethodLinks
0.9643.8
2026.05
0.95591.4
2026.05
0.95541.3
0.9451
0.94311.1
2026.05
0.94274.3
0.9422.4
0.945.1
2026.05
0.93931.2
0.9281.6
0.9224.3
2026.05
0.9065.2
2026.05
0.9013.8
0.87834.1
2026.05
0.87824.2
0.5312