Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Comprehensive Reasoning on Mathematical and General Reasoning Suite Combined
Loading...
46.7
Overall Accuracy
WIST
11.86
20.905
29.95
38.995
Mar 22, 2026
Overall Accuracy
Updated 24d ago
Evaluation Results
Method
Method
Links
Overall Accuracy
WIST
Backbone=Qwen3-8B-Base
2026.03
46.7
SPICE
Backbone=Qwen3-8B-Base
2026.03
46
R-Zero
Backbone=Qwen3-8B-Base
2026.03
45.5
WIST
Backbone=Qwen3-4B-Base
2026.03
43.1
Base Model
Backbone=Qwen3-8B-Base
2026.03
42.1
SPICE
Backbone=Qwen3-4B-Base
2026.03
41.8
R-Zero
Backbone=Qwen3-4B-Base
2026.03
40.6
Base Model
Backbone=Qwen3-4B-Base
2026.03
33.3
WIST
Backbone=OctoThinker-8...
2026.03
32.6
SPICE
Backbone=OctoThinker-8...
2026.03
31.4
R-Zero
Backbone=OctoThinker-8...
2026.03
30.6
SPICE
Backbone=OctoThinker-3...
2026.03
25.2
WIST
Backbone=OctoThinker-3...
2026.03
24.5
Base Model
Backbone=OctoThinker-8...
2026.03
22.9
R-Zero
Backbone=OctoThinker-3...
2026.03
22.5
Base Model
Backbone=OctoThinker-3...
2026.03
13.2
Feedback
Search any
task
Search any
task