Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Expert-level Question Answering on GPQA
Loading...
28.9
Mean Score @16
OptPO-SFT
27.548
27.899
28.25
28.601
Dec 2, 2025
Mean Score @16
Pass Rate @16
Pass Rate @1
Total Tokens Generated
Token Saving
Updated 4d ago
Evaluation Results
Method
Method
Links
Mean Score @16
Pass Rate @16
Pass Rate @1
Total Tokens Generated
Token Saving
OptPO-SFT
Backbone=Llama-3.1-8B-...
2025.12
28.9
82.8
32.3
-
61.89
TTSFT
Backbone=Llama-3.1-8B-...
2025.12
28.8
85.9
30.3
-
-
TTSFT
Backbone=Qwen2.5-Math-...
2025.12
27.9
81.8
30.3
-
-
OptPO-SFT
Backbone=Qwen2.5-Math-...
2025.12
27.6
78.8
31.3
-
51.17
Feedback
Search any
task
Search any
task