Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on BBEH
Loading...
15.31
pass@1
Qwen3-8B
6.8652
9.0576
11.25
13.4424
Jan 8, 2026
pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
pass@1
Qwen3-8B
Backbone=Qwen3-8B
2026.01
15.31
RelayLLM (Simple)
Student Model Backbone...
2026.01
12.67
RelayLLM (Difficulty-Aware)
Student Model Backbone...
2026.01
12.46
CITER
Student Model Backbone...
2026.01
11.67
GRPO
Student Model Backbone...
2026.01
10.89
Base Model
Student Model Backbone...
2026.01
9.91
RelayLLM (Difficulty-Aware)
Student Model Backbone...
2026.01
8.56
RelayLLM (Simple)
Student Model Backbone...
2026.01
8.32
CITER
Student Model Backbone...
2026.01
8.16
GRPO
Student Model Backbone...
2026.01
7.82
Base Model
Student Model Backbone...
2026.01
7.19
Feedback
Search any
task
Search any
task