Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Discrimination on PsyCLIENT-CP +behavior
Loading...
26.1
Accuracy (A)
DeepSeek-R1
-1.044
6.003
13.05
20.097
Jan 12, 2026
Accuracy (A)
Accuracy (B)
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy (A)
Accuracy (B)
DeepSeek-R1
LLM-as-a-judge=true
2026.01
26.1
9.2
Claude-Sonet-3.5
LLM-as-a-judge=true
2026.01
14.4
7.5
Qwen3-235B-A22B
LLM-as-a-judge=true
2026.01
14.2
17.2
DeepSeek-V3-0324
LLM-as-a-judge=true
2026.01
1.4
0.3
GPT-4o
LLM-as-a-judge=true
2026.01
0.6
0.6
Qwen2.5-72B-Instruct
LLM-as-a-judge=true
2026.01
0
0
Feedback
Search any
task
Search any
task