Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on BeyondAIME (accuracy)
Loading...
82.5
Accuracy
TRICE-30B
50.052
58.476
66.9
75.324
May 7, 2026
Accuracy
Updated 26d ago
Evaluation Results
Method
Method
Links
Accuracy
TRICE-30B
Tool Usage=true, Param...
2026.05
82.5
TRICE-30B
Tool Use=Yes
2026.05
82.5
GLM-4.7-Flash w/ recipe
Tool Usage=true, Param...
2026.05
81
Nemotron-3-Nano-30B-A3B
Tool Usage=true, Param...
2026.05
80
DeepSeek-V3.2-Thinking
Tool Use=No
2026.05
76.8
GLM-4.7-Flash
Tool Usage=true, Param...
2026.05
76
Qwen3.5-35B-A3B
Tool Usage=false, Para...
2026.05
72.5
Qwen3-235B-A22B-Thinking
Tool Use=No
2026.05
71.8
TRICE-4B
Tool Usage=true, Param...
2026.05
71.3
TRICE-30B
Tool Usage=false, Para...
2026.05
71
Qwen3.5-9B
Tool Usage=false, Para...
2026.05
67.3
Qwen3-30B-A3B-Thinking-2507
Tool Usage=false, Para...
2026.05
65.9
GPT-OSS-20B
Tool Usage=true, Param...
2026.05
63
ASTER-4B†
Tool Usage=true, Param...
2026.05
61.7
Qwen3.5-4B
Tool Usage=false, Para...
2026.05
58.8
TRICE-4B
Tool Usage=false, Para...
2026.05
58.5
Qwen3-4B-Thinking-2507
Tool Usage=false, Para...
2026.05
54.3
Qwen3-30B-A3B-Instruct-2507
Tool Usage=false, Para...
2026.05
51.3
Feedback
Search any
task
Search any
task