Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning on GPQA Diamond (pass@1, pass@5)
Loading...
20.2
Pass@1
FA
11.2768
13.5934
15.91
18.2266
Oct 4, 2025
Pass@1
Pass@5
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@5
FA
Backbone=Llama-3.2-3B-...
2025.10
20.2
50.61
RS
Backbone=Llama-3.2-3B-...
2025.10
18.13
40.08
ToT
Backbone=Llama-3.2-3B-...
2025.10
16.77
44.44
STaR
Backbone=Llama-3.2-3B-...
2025.10
16.61
38.41
CAA
Backbone=Llama-3.2-3B-...
2025.10
15.66
40.23
Base Model
Backbone=Llama-3.2-3B-...
2025.10
11.62
28.28
Feedback
Search any
task
Search any
task