Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Quality Scoring on Science Scenario
Loading...
97.6
Avg. Score
CASTER
84.08
87.59
91.1
94.61
Jan 27, 2026
Avg. Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg. Score
CASTER
Model=qwen
2026.01
97.6
CASTER
Model=deepseek
2026.01
97.5
Force Weak
Model=qwen
2026.01
97.5
Force Strong
Model=claude
2026.01
96.7
Force Strong
Model=qwen
2026.01
96.7
CASTER
Model=claude
2026.01
95.8
CASTER
Model=openai
2026.01
95.4
Force Strong
Model=openai
2026.01
95.3
Force Weak
Model=gemini
2026.01
95
CASTER
Model=gemini
2026.01
95
Force Weak
Model=claude
2026.01
94.8
Force Strong
Model=gemini
2026.01
94.2
Force Weak
Model=deepseek
2026.01
89.5
Force Weak
Model=openai
2026.01
87.5
Force Strong
Model=deepseek
2026.01
84.6
Feedback
Search any
task
Search any
task