Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Polymath Low
Loading...
96.8
Accuracy (en)
Selective translation
93.784
94.567
95.35
96.133
Oct 31, 2025
Accuracy (en)
Accuracy (de)
Accuracy (es)
Accuracy (ar)
Accuracy (ja)
Accuracy (ko)
Accuracy (th)
Accuracy (bn)
Accuracy (sw)
Accuracy (te)
Average Accuracy
Translator Usage (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (en)
Accuracy (de)
Accuracy (es)
Accuracy (ar)
Accuracy (ja)
Accuracy (ko)
Accuracy (th)
Accuracy (bn)
Accuracy (sw)
Accuracy (te)
Average Accuracy
Translator Usage (%)
Selective translation
Model=gpt-oss-20b
2025.10
96.8
85.9
91.5
90.4
85.9
90.9
90.9
88.5
84.8
83.5
88.9
16.7
Base
Model=gpt-oss-20b
2025.10
96
85.6
89.9
90.9
86.1
91.2
90.4
90.1
78.7
81.1
88
0
Full translation
Model=gpt-oss-20b
2025.10
93.9
84.5
88.3
90.4
86.4
88.5
87.7
86.9
86.4
85.6
87.9
100
Feedback
Search any
task
Search any
task