Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Discrimination on PsyCLIENT-CP Human Client
Loading...
100
Accuracy (A)
Qwen2.5-72B-Instruct
71.088
78.594
86.1
93.606
Jan 12, 2026
Accuracy (A)
Accuracy (B)
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy (A)
Accuracy (B)
Qwen2.5-72B-Instruct
LLM-as-a-judge=true, B...
2026.01
100
100
GPT-4o
LLM-as-a-judge=true, B...
2026.01
98.1
98.6
DeepSeek-V3-0324
LLM-as-a-judge=true, B...
2026.01
97.8
100
Claude-Sonet-3.5
LLM-as-a-judge=true, B...
2026.01
95
98.6
Qwen3-235B-A22B
LLM-as-a-judge=true, B...
2026.01
79.4
78.6
DeepSeek-R1
LLM-as-a-judge=true, B...
2026.01
72.2
87.8
Feedback
Search any
task
Search any
task