Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Question Answering on GSM8K (test)
Loading...
1.16
RAvg
CluCERT
0.7856
0.8828
0.98
1.0772
Dec 1, 2025
RAvg
Coe
Updated 3mo ago
Evaluation Results
Method
Method
Links
RAvg
Coe
CluCERT
Base Model=ChatGPT-3.5...
2025.12
1.16
1.07
CluCERT(-Clu)
Base Model=ChatGPT-3.5...
2025.12
1.05
1.15
SelfDenoise
Base Model=ChatGPT-3.5...
2025.12
0.92
1.25
RanMASK
Base Model=ChatGPT-3.5...
2025.12
0.8
1.35
Feedback
Search any
task
Search any
task