Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on DEEPMATH-103K
Loading...
95.5
Accuracy
TRS
58.164
67.857
77.55
87.243
Apr 23, 2026
Accuracy
Token Count
Cost Percentage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Token Count
Cost Percentage
TRS
Model=Gemini-3-Flash
2026.04
95.5
6,106
82.5
Direct
Model=Doubao Seed (See...
2026.04
95.1
3,453
100
TRS
Model=Doubao Seed (See...
2026.04
94.9
1,411
46.2
Direct
Model=Gemini-3-Flash
2026.04
94.8
7,490
100
Direct
Model=GPT-OSS-120B
2026.04
93.7
1,257
100
TRS
Model=GPT-OSS-120B
2026.04
93.7
976
83.1
TRS
Model=GPT-4o-mini
2026.04
61.4
650
93.1
Direct
Model=GPT-4o-mini
2026.04
59.6
819
100
Feedback
Search any
task
Search any
task