Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Quality Scoring on Software Scenario
Loading...
100
Average Score
Force Strong
81.488
86.294
91.1
95.906
Jan 27, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
Force Strong
Model=claude
2026.01
100
Force Strong
Model=gemini
2026.01
100
CASTER
Model=gemini
2026.01
100
CASTER
Model=qwen
2026.01
100
Force Weak
Model=gemini
2026.01
99.8
Force Strong
Model=qwen
2026.01
99.3
Force Strong
Model=deepseek
2026.01
98.6
Force Weak
Model=deepseek
2026.01
98.3
CASTER
Model=deepseek
2026.01
98.1
Force Weak
Model=qwen
2026.01
97.8
CASTER
Model=openai
2026.01
97
CASTER
Model=claude
2026.01
96.4
Force Strong
Model=openai
2026.01
95.3
Force Weak
Model=claude
2026.01
93.3
Force Weak
Model=openai
2026.01
82.2
Feedback
Search any
task
Search any
task