Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Response Selection on P-Soups Informativeness
Loading...
78.07
Accuracy
ALIGNXPLORE+
45.5908
54.0229
62.455
70.8871
Jan 8, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ALIGNXPLORE+
Inference Setting=Full...
2026.01
78.07
ALIGNXPLORE+
Inference Setting=Stre...
2026.01
76.57
ALIGNXPLORE
Inference Setting=Full...
2026.01
76.24
Qwen3-8Bthinking
Inference Setting=Full...
2026.01
75.08
ALIGNXPLORE
Inference Setting=Stre...
2026.01
74.41
Qwen3-8Bthinking
Inference Setting=Stre...
2026.01
74.08
Qwen3-32Bthinking
Inference Setting=Stre...
2026.01
73.58
Qwen3-32Bthinking
Inference Setting=Full...
2026.01
73.25
DeepSeek-R1-671B
Inference Setting=Full...
2026.01
72.59
GPT-OSS-20B
Inference Setting=Stre...
2026.01
69.93
GPT-OSS-20B
Inference Setting=Full...
2026.01
68.77
DeepSeek-R1-671B
Inference Setting=Stre...
2026.01
66.61
DS-R1-Distill-Qwen-7B
Inference Setting=Stre...
2026.01
58.63
DS-R1-Distill-Qwen-7B
Inference Setting=Full...
2026.01
56.14
TALLRec
Inference Setting=Dire...
2026.01
51.66
Qwen3-8Bnon-thinking
Inference Setting=Dire...
2026.01
46.84
Feedback
Search any
task
Search any
task