Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning Generalization on Comprehensive Reasoning Suite
Loading...
63.88
Average Score
GHS-TDA
55.2688
57.5044
59.74
61.9756
Feb 10, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
GHS-TDA
Backbone=Llama 3-8B
2026.02
63.88
AoT
Backbone=Llama 3-8B
2026.02
62.82
GHS-TDA
Backbone=Qwen 2-14B
2026.02
62.35
GoT
Backbone=Llama 3-8B
2026.02
61.1
ToT
Backbone=Llama 3-8B
2026.02
60.85
AoT
Backbone=Qwen 2-14B
2026.02
60.81
ToT
Backbone=Qwen 2-14B
2026.02
59.66
GoT
Backbone=Qwen 2-14B
2026.02
59.62
CoT
Backbone=Llama 3-8B
2026.02
56.99
CoT
Backbone=Qwen 2-14B
2026.02
55.6
Feedback
Search any
task
Search any
task