Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Quality Scoring on Security Scenario
Loading...
96.2
Average Score
CASTER
89.024
90.887
92.75
94.613
Jan 27, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
CASTER
Model=gemini
2026.01
96.2
Force Strong
Model=qwen
2026.01
95.9
Force Strong
Model=gemini
2026.01
95.6
CASTER
Model=qwen
2026.01
95.2
CASTER
Model=claude
2026.01
95.1
CASTER
Model=deepseek
2026.01
94.8
Force Weak
Model=gemini
2026.01
94.8
Force Weak
Model=claude
2026.01
94.4
Force Strong
Model=claude
2026.01
94.3
Force Weak
Model=qwen
2026.01
94.1
CASTER
Model=openai
2026.01
93.9
Force Strong
Model=openai
2026.01
93.7
Force Weak
Model=openai
2026.01
92.9
Force Strong
Model=deepseek
2026.01
91.1
Force Weak
Model=deepseek
2026.01
89.3
Feedback
Search any
task
Search any
task