Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on GSM-Infinite Hard

50.4Accuracy

DeepSeek-V3.2

Updated 3mo ago

Evaluation Results

Method	Links
DeepSeek-V3.2 2026.01		50.4
DeepSeek-V3.2 2026.01		45.2
DeepSeek-V3.1 2026.01		41.5
DeepSeek-V3.1 2026.01		38.8
MiMo-V2-Flash 2026.01		37.7
DeepSeek-V3.1 2026.01		34.7
Kimi-K2 2026.01		34.6
MiMo-V2-Flash 2026.01		33.7
DeepSeek-V3.2 2026.01		32.6
MiMo-V2-Flash 2026.01		31.5
MiMo-V2-Flash 2026.01		29
DeepSeek-V3.1 2026.01		28.7
Kimi-K2 2026.01		26.1
DeepSeek-V3.2 2026.01		25.7
Kimi-K2 2026.01		16
Kimi-K2 2026.01		8.8