Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-Context Mathematical Reasoning on GSM-Infinite
Loading...
87.06
Accuracy
Gemini-3.0-pro
3.3816
25.1058
46.83
68.5542
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3.0-pro
2026.03
87.06
Deepseek-v3.1
2026.03
47.4
Qwen3-32B + TableLong
2026.03
23.4
Deepseek-R1-Distill-Qwen-32B + TableLong
2026.03
14.8
Qwen2.5-32B-Instruct + TableLong
2026.03
13
Qwen3-32B
2026.03
12.22
Qwen2.5-32B-Instruct
2026.03
9.6
Deepseek-R1-Distill-Qwen-14B + TableLong
2026.03
9.6
Qwen-Long-L1
2026.03
9
Deepseek-R1-Distill-Qwen-14B
2026.03
9
Deepseek-R1-Distill-Qwen-32B
2026.03
6.6
Feedback
Search any
task
Search any
task