Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MGSM-zh (test)
Loading...
89.6
Accuracy
JT-Safe-V2-35B
39.264
52.332
65.4
78.468
Feb 5, 2024
Jun 23, 2024
Nov 10, 2024
Mar 30, 2025
Aug 16, 2025
Jan 3, 2026
May 23, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
JT-Safe-V2-35B
Parameters=35B
2026.05
89.6
SOTA with Equivalent Parameters
Model comparison=Equiv...
2026.05
89.2
DeepSeekMath-RL
Size=7B, Reasoning Mod...
2024.02
79.6
DeepSeekMath-RL
Size=7B, Reasoning Mod...
2024.02
78.4
DeepSeek-LLM-Chat
Size=67B, Reasoning Mo...
2024.02
76.4
DeepSeek-LLM-Chat
Size=67B, Reasoning Mo...
2024.02
74
DeepSeekMath-Instruct
Size=7B, Reasoning Mod...
2024.02
73.2
DeepSeekMath-Instruct
Size=7B, Reasoning Mod...
2024.02
72
MetaMath
Size=70B, Reasoning Mo...
2024.02
66.4
SeaLLM-v2
Size=7B, Reasoning Mod...
2024.02
64.8
WizardMath-v1.0
Size=70B, Reasoning Mo...
2024.02
64.8
ToRA
Size=34B, Reasoning Mo...
2024.02
41.2
Feedback
Search any
task
Search any
task