Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Discrimination on PsyCLIENT-CP +content
Loading...
61.9
Accuracy (A)
Claude-Sonet-3.5
-2.164
14.468
31.1
47.732
Jan 12, 2026
Accuracy (A)
Accuracy (B)
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy (A)
Accuracy (B)
Claude-Sonet-3.5
LLM-as-a-judge=true
2026.01
61.9
36.1
DeepSeek-R1
LLM-as-a-judge=true
2026.01
12.8
8.3
Qwen3-235B-A22B
LLM-as-a-judge=true
2026.01
5.6
10
DeepSeek-V3-0324
LLM-as-a-judge=true
2026.01
1.9
0.6
GPT-4o
LLM-as-a-judge=true
2026.01
0.8
5.8
Qwen2.5-72B-Instruct
LLM-as-a-judge=true
2026.01
0.3
0.3
Feedback
Search any
task
Search any
task