Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Vision-Language Reasoning on NaturalBench (test)
Loading...
66.02
Simple Accuracy
Q4 system redistr (prop)
59.5408
61.2229
62.905
64.5871
Jan 18, 2026
Simple Accuracy
Paired Accuracy
Yes Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Simple Accuracy
Paired Accuracy
Yes Rate
Q4 system redistr (prop)
Intervention=Q4 system...
2026.01
66.02
34.59
66.67
Q4 text redistr (prop)
Intervention=Q4 text r...
2026.01
62.17
25.52
82.07
Image×2.0
Intervention=Image×2.0
2026.01
61.28
23.72
83.83
Q4 system abl
Intervention=Q4 system...
2026.01
60.97
23
84.55
No intervention baseline
Intervention=None
2026.01
60.22
21.55
85.67
PAI
Intervention=PAI
2026.01
59.97
21.1
86.03
AD-HH
Intervention=AD-HH
2026.01
59.79
20.66
86.34
Feedback
Search any
task
Search any
task