Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on Combined Reasoning Benchmarks
Loading...
49.4
Overall Accuracy
WIST
46.072
46.936
47.8
48.664
Mar 22, 2026
Overall Accuracy
Updated 24d ago
Evaluation Results
Method
Method
Links
Overall Accuracy
WIST
Backbone=Qwen3-14B-Base
2026.03
49.4
SPICE
Backbone=Qwen3-14B-Base
2026.03
48.6
R-Zero
Backbone=Qwen3-14B-Base
2026.03
47.5
Base Model
Backbone=Qwen3-14B-Base
2026.03
46.2
Feedback
Search any
task
Search any
task