Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Quality Scoring on Data Scenario
Loading...
84.6
Average Score
CASTER
50.8
59.575
68.35
77.125
Jan 27, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
CASTER
Model=claude
2026.01
84.6
Force Strong
Model=claude
2026.01
83.3
Force Weak
Model=claude
2026.01
80.7
CASTER
Model=deepseek
2026.01
78.1
Force Strong
Model=openai
2026.01
75.5
CASTER
Model=openai
2026.01
75.4
CASTER
Model=qwen
2026.01
73.6
Force Strong
Model=qwen
2026.01
73.5
Force Strong
Model=deepseek
2026.01
73.4
Force Weak
Model=openai
2026.01
73.1
Force Weak
Model=qwen
2026.01
70.4
Force Weak
Model=deepseek
2026.01
69.2
Force Strong
Model=gemini
2026.01
53.6
CASTER
Model=gemini
2026.01
53.6
Force Weak
Model=gemini
2026.01
52.1
Feedback
Search any
task
Search any
task