Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Response Selection on P-Soups Expertise
Loading...
83.66
Accuracy
Qwen3-32Bthinking
36.5168
48.7559
60.995
73.2341
Jan 8, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-32Bthinking
Inference Setting=Full...
2026.01
83.66
Qwen3-8Bthinking
Inference Setting=Full...
2026.01
83.5
Qwen3-8Bthinking
Inference Setting=Stre...
2026.01
82.66
ALIGNXPLORE+
Inference Setting=Full...
2026.01
82.5
DeepSeek-R1-671B
Inference Setting=Full...
2026.01
82.33
Qwen3-32Bthinking
Inference Setting=Stre...
2026.01
81.67
GPT-OSS-20B
Inference Setting=Full...
2026.01
81.66
DeepSeek-R1-671B
Inference Setting=Stre...
2026.01
79
ALIGNXPLORE+
Inference Setting=Stre...
2026.01
78.5
GPT-OSS-20B
Inference Setting=Stre...
2026.01
77.5
ALIGNXPLORE
Inference Setting=Full...
2026.01
72.66
ALIGNXPLORE
Inference Setting=Stre...
2026.01
69.16
DS-R1-Distill-Qwen-7B
Inference Setting=Full...
2026.01
66
DS-R1-Distill-Qwen-7B
Inference Setting=Stre...
2026.01
64.16
TALLRec
Inference Setting=Dire...
2026.01
60.16
Qwen3-8Bnon-thinking
Inference Setting=Dire...
2026.01
38.33
Feedback
Search any
task
Search any
task