Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Vision-Language Reasoning on SugarCrepe (test)
Loading...
62.75
Simple Accuracy
Q4 system redistr (prop)
56.9156
58.4303
59.945
61.4597
Jan 18, 2026
Simple Accuracy
Paired Accuracy
Yes Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Simple Accuracy
Paired Accuracy
Yes Rate
Q4 system redistr (prop)
Intervention=Q4 system...
2026.01
62.75
25.93
85.71
Q4 text redistr (prop)
Intervention=Q4 text r...
2026.01
58.79
17.58
90.66
Image×2.0
Intervention=Image×2.0
2026.01
58.18
16.37
91.37
Q4 system abl
Intervention=Q4 system...
2026.01
57.97
16.04
91.59
No intervention baseline
Intervention=None
2026.01
57.53
15.16
92.03
PAI
Intervention=PAI
2026.01
57.47
15.05
91.98
AD-HH
Intervention=AD-HH
2026.01
57.14
14.4
92.53
Feedback
Search any
task
Search any
task