Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Vision-Language Reasoning on BEAF (test)
Loading...
88.4
Simple Accuracy
Q4 system redistr (prop)
82.7944
84.2497
85.705
87.1603
Jan 18, 2026
Simple Accuracy
Paired Accuracy
Yes-Rate
Updated 3d ago
Evaluation Results
Method
Method
Links
Simple Accuracy
Paired Accuracy
Yes-Rate
Q4 system redistr (prop)
Intervention=Q4 system...
2026.01
88.4
84.74
32.62
Image×2.0
Intervention=Image×2.0
2026.01
85.73
81.01
41.55
Q4 text redistr (prop)
Intervention=Q4 text r...
2026.01
85.55
80.75
41.68
Q4 system abl
Intervention=Q4 system...
2026.01
84.44
79.45
43.58
No intervention baseline
Intervention=None
2026.01
83.74
78.62
44.69
AD-HH
Intervention=AD-HH
2026.01
83.67
78.51
44.74
PAI
Intervention=PAI
2026.01
83.01
77.72
45.73
Feedback
Search any
task
Search any
task