Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Discrimination on PsyCLIENT-CP
Loading...
16.4
Accuracy (A)
DeepSeek-R1
-0.656
3.772
8.2
12.628
Jan 12, 2026
Accuracy (A)
Accuracy (B)
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy (A)
Accuracy (B)
DeepSeek-R1
LLM-as-a-judge=true
2026.01
16.4
3.6
Qwen3-235B-A22B
LLM-as-a-judge=true
2026.01
12.7
14.7
Claude-Sonet-3.5
LLM-as-a-judge=true
2026.01
5.6
2.2
GPT-4o
LLM-as-a-judge=true
2026.01
0.8
0.3
DeepSeek-V3-0324
LLM-as-a-judge=true
2026.01
0.3
0
Qwen2.5-72B-Instruct
LLM-as-a-judge=true
2026.01
0
0
Feedback
Search any
task
Search any
task