Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on ASDiv (Accuracy)

0.955Accuracy

Qwen2.5-14B-Instruct-1M

0.846840.874920.9030.93108Feb 15, 2024Jun 13, 2024Oct 10, 2024Feb 6, 2025Jun 5, 2025Oct 2, 2025Jan 29, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2025.08
0.955
2025.06
0.953
2025.06
0.953
2025.08
0.952
2025.06
0.951
2025.06
0.949
2025.08
0.947
2025.08
0.944
2024.05
0.943
2025.08
0.939
2024.05
0.931
2025.03
0.93
2024.05
0.926
2024.05
0.926
2024.05
0.926
2024.02
0.926
2025.02
0.924
2025.03
0.924
2025.03
0.923
2024.05
0.922
2025.03
0.92
2024.05
0.918
2025.02
0.918
2026.01
0.917
2025.02
0.916
2025.02
0.912
2025.02
0.91
2026.01
0.909
2026.01
0.909
2026.01
0.909
2026.01
0.909
2026.01
0.909
2026.01
0.908
2026.01
0.908
2025.02
0.907
2025.03
0.907
2026.01
0.907
2026.01
0.907
2025.02
0.906
2026.01
0.905
2025.03
0.904
2026.01
0.904
2024.05
0.903
2026.01
0.902
2024.05
0.901
2024.05
0.899
2026.01
0.899
2026.01
0.899
2026.01
0.899
2026.01
0.898
2026.01
0.897
2026.01
0.897
2026.01
0.896
2025.03
0.894
2026.01
0.893
2026.01
0.891
2026.01
0.891
2024.05
0.889
2026.01
0.888
2026.01
0.888
2026.01
0.887
2026.01
0.886
2026.01
0.884
2026.01
0.884
2026.01
0.884
2026.01
0.884
2026.01
0.884
2024.05
0.883
2025.03
0.881
2026.01
0.878
2024.05
0.877
2026.01
0.876
2024.05
0.875
2026.01
0.874
2026.01
0.869
2026.01
0.869
2024.02
0.868
2026.01
0.868
2026.01
0.867
2025.08
0.866
2026.01
0.865
2025.02
0.864
2026.01
0.862
2026.01
0.86
2026.01
0.859
2026.01
0.858
2026.01
0.858
2026.01
0.858
2026.01
0.858
2026.01
0.857
2026.01
0.857
2026.01
0.856
2024.05
0.855
2026.01
0.855
2026.01
0.855
2026.01
0.855
2026.01
0.854
2025.02
0.853
2026.01
0.853
2024.05
0.851
Showing 100 of 221 rows