Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on HeadQA
Loading...
77.9
Pass@1
PSFT
67.812
70.431
73.05
75.669
Aug 25, 2025
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
PSFT
Backbone=Llama3.1-8B-I...
2025.08
77.9
PSFTwarm-up
Backbone=Llama3.1-8B-I...
2025.08
77.71
PSFT
Backbone=Qwen2.5-7B-In...
2025.08
75.82
PSFTwarm-up
Backbone=Qwen2.5-7B-In...
2025.08
75.24
SFT
Backbone=Qwen2.5-7B-In...
2025.08
74.65
SFT-KL
Backbone=Qwen2.5-7B-In...
2025.08
74.07
SFT
Backbone=Llama3.1-8B-I...
2025.08
73.52
Base
Backbone=Qwen2.5-7B-In...
2025.08
73.41
SFT-KL
Backbone=Llama3.1-8B-I...
2025.08
71.95
Base
Backbone=Llama3.1-8B-I...
2025.08
68.2
Feedback
Search any
task
Search any
task