Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended QA on Open-ended QA
Loading...
99.9
Precision
Post-trained
96.676
97.513
98.35
99.187
Feb 6, 2026
Precision
Entropy
Coverage (n)
Updated 4d ago
Evaluation Results
Method
Method
Links
Precision
Entropy
Coverage (n)
Post-trained
Backbone=Qwen
2026.02
99.9
0.659
8.6
Post-trained
Backbone=Gemma
2026.02
99.4
0.771
9.6
Proxy-Soup
Backbone=Gemma
2026.02
99.3
0.967
12.1
Post-trained
Backbone=Llama
2026.02
98.6
1.442
20.4
Proxy-Soup
Backbone=Llama
2026.02
98.1
1.76
28.5
SLR
Backbone=Qwen
2026.02
97.8
1.583
22.6
SLR
Backbone=Gemma
2026.02
97.8
1.827
30.7
Proxy-Soup
Backbone=Qwen
2026.02
97.7
0.919
12.4
SLR
Backbone=Llama
2026.02
96.8
2.322
46.1
Feedback
Search any
task
Search any
task