Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Visual Reasoning on Out-of-Domain (OOD) Aggregate (HalluBench, MathVista, MathVerse, MathVision)
Loading...
0.5531
OOD Avg Accuracy
SaEI
0.497772
0.512136
0.5265
0.540864
Dec 11, 2025
OOD Avg Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
OOD Avg Accuracy
SaEI
Finetuning status=Fine...
2025.12
0.5531
KL-Cov
Finetuning status=Fine...
2025.12
0.5474
Vanilla GRPO
Finetuning status=Fine...
2025.12
0.547
NoisyRollout
Finetuning status=Fine...
2025.12
0.5368
Qwen2.5-VL-7B-Instruct
Finetuning status=Not...
2025.12
0.4999
Feedback
Search any
task
Search any
task