Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Response Generation on HiCUPID
Loading...
63.9
Accuracy
DeepSeek-R1-671B
46.3448
50.9024
55.46
60.0176
Jan 8, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
DeepSeek-R1-671B
Inference Setting=Full...
2026.01
63.9
Qwen3-32Bthinking
Inference Setting=Full...
2026.01
63.44
ALIGNXPLORE+
Inference Setting=Full...
2026.01
62.42
GPT-OSS-20B
Inference Setting=Full...
2026.01
62
ALIGNXPLORE+
Inference Setting=Stre...
2026.01
60.51
DeepSeek-R1-671B
Inference Setting=Stre...
2026.01
60.32
Qwen3-8Bthinking
Inference Setting=Full...
2026.01
60.05
DS-R1-Distill-Qwen-7B
Inference Setting=Full...
2026.01
60.01
GPT-OSS-20B
Inference Setting=Stre...
2026.01
59.93
Qwen3-32Bthinking
Inference Setting=Stre...
2026.01
59.83
DS-R1-Distill-Qwen-7B
Inference Setting=Stre...
2026.01
59.29
Qwen3-8Bthinking
Inference Setting=Stre...
2026.01
59.17
ALIGNXPLORE
Inference Setting=Full...
2026.01
53.5
ALIGNXPLORE
Inference Setting=Stre...
2026.01
50.34
TALLRec
Inference Setting=Dire...
2026.01
47.41
Qwen3-8Bnon-thinking
Inference Setting=Dire...
2026.01
47.02
Feedback
Search any
task
Search any
task