Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Response Selection on P-Soups Style
Loading...
0.88
Accuracy
Qwen3-32Bthinking
0.405032
0.528341
0.65165
0.774959
Jan 8, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-32Bthinking
Inference Setting=Full...
2026.01
0.88
Qwen3-8Bthinking
Inference Setting=Full...
2026.01
0.875
ALIGNXPLORE+
Inference Setting=Full...
2026.01
0.8633
GPT-OSS-20B
Inference Setting=Full...
2026.01
0.86
DeepSeek-R1-671B
Inference Setting=Full...
2026.01
0.8566
DeepSeek-R1-671B
Inference Setting=Stre...
2026.01
0.85
Qwen3-8Bthinking
Inference Setting=Stre...
2026.01
0.85
Qwen3-32Bthinking
Inference Setting=Stre...
2026.01
0.8366
GPT-OSS-20B
Inference Setting=Stre...
2026.01
0.83
ALIGNXPLORE+
Inference Setting=Stre...
2026.01
0.8033
ALIGNXPLORE
Inference Setting=Full...
2026.01
0.78
ALIGNXPLORE
Inference Setting=Stre...
2026.01
0.7483
TALLRec
Inference Setting=Dire...
2026.01
0.7016
DS-R1-Distill-Qwen-7B
Inference Setting=Full...
2026.01
0.6583
DS-R1-Distill-Qwen-7B
Inference Setting=Stre...
2026.01
0.6083
Qwen3-8Bnon-thinking
Inference Setting=Dire...
2026.01
0.4233
Feedback
Search any
task
Search any
task