Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on BBEH
Loading...
15.31
pass@1
Qwen3-8B
6.8652
9.0576
11.25
13.4424
Jan 8, 2026
pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
Qwen3-8B
Backbone=Qwen3-8B
2026.01
15.31
RelayLLM (Simple)
Student Model Backbone...
2026.01
12.67
RelayLLM (Difficulty-Aware)
Student Model Backbone...
2026.01
12.46
CITER
Student Model Backbone...
2026.01
11.67
GRPO
Student Model Backbone...
2026.01
10.89
Base Model
Student Model Backbone...
2026.01
9.91
RelayLLM (Difficulty-Aware)
Student Model Backbone...
2026.01
8.56
RelayLLM (Simple)
Student Model Backbone...
2026.01
8.32
CITER
Student Model Backbone...
2026.01
8.16
GRPO
Student Model Backbone...
2026.01
7.82
Base Model
Student Model Backbone...
2026.01
7.19
Feedback
Search any
task
Search any
task