Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Discrimination on PsyCLIENT-CP vanilla
Loading...
83.1
Accuracy (A)
Claude-Sonet-3.5
-1.868
20.191
42.25
64.309
Jan 12, 2026
Accuracy (A)
Accuracy (B)
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy (A)
Accuracy (B)
Claude-Sonet-3.5
LLM-as-a-judge=true
2026.01
83.1
0.622
DeepSeek-R1
LLM-as-a-judge=true
2026.01
27.8
0.158
Qwen3-235B-A22B
LLM-as-a-judge=true
2026.01
10.8
0.111
DeepSeek-V3-0324
LLM-as-a-judge=true
2026.01
7.2
0.039
GPT-4o
LLM-as-a-judge=true
2026.01
5
0.072
Qwen2.5-72B-Instruct
LLM-as-a-judge=true
2026.01
1.4
0.008
Feedback
Search any
task
Search any
task