Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM-Infinite Hard
Loading...
50.4
Accuracy
DeepSeek-V3.2
7.136
18.368
29.6
40.832
Jan 6, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
DeepSeek-V3.2
Length=16K, #Activated...
2026.01
50.4
DeepSeek-V3.2
Length=32K, #Activated...
2026.01
45.2
DeepSeek-V3.1
Length=16K, #Activated...
2026.01
41.5
DeepSeek-V3.1
Length=32K, #Activated...
2026.01
38.8
MiMo-V2-Flash
Length=16K, #Activated...
2026.01
37.7
DeepSeek-V3.1
Length=64K, #Activated...
2026.01
34.7
Kimi-K2
Length=16K, #Activated...
2026.01
34.6
MiMo-V2-Flash
Length=32K, #Activated...
2026.01
33.7
DeepSeek-V3.2
Length=64K, #Activated...
2026.01
32.6
MiMo-V2-Flash
Length=64K, #Activated...
2026.01
31.5
MiMo-V2-Flash
Length=128K, #Activate...
2026.01
29
DeepSeek-V3.1
Length=128K, #Activate...
2026.01
28.7
Kimi-K2
Length=32K, #Activated...
2026.01
26.1
DeepSeek-V3.2
Length=128K, #Activate...
2026.01
25.7
Kimi-K2
Length=64K, #Activated...
2026.01
16
Kimi-K2
Length=128K, #Activate...
2026.01
8.8
Feedback
Search any
task
Search any
task