Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Judge Accuracy on Ultra-Problem (Target)
Loading...
61.5
Accuracy
Seed Rubric
57.444
58.497
59.55
60.603
Feb 14, 2026
Accuracy
Delta (Bench - Target)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta (Bench - Target)
Seed Rubric
Judge Model=Qwen3-14B
2026.02
61.5
11.3
Biased Rubric Search
Judge Model=Qwen3-14B
2026.02
57.6
15.4
Feedback
Search any
task
Search any
task