Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Capability Evaluation on Average (MMLU, GSM8K, MBPP)
Loading...
78.84
Accuracy
Baseline
71.6536
73.5193
75.385
77.2507
Mar 16, 2026
Accuracy
Utility Preservation
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Utility Preservation
Baseline
2026.03
78.84
-
SFCoT
2026.03
71.93
91.2
Feedback
Search any
task
Search any
task