Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Reasoning on Out-of-Domain (OOD) Aggregate (HalluBench, MathVista, MathVerse, MathVision)
Loading...
0.5531
OOD Avg Accuracy
SaEI
0.497772
0.512136
0.5265
0.540864
Dec 11, 2025
OOD Avg Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
OOD Avg Accuracy
SaEI
Finetuning status=Fine...
2025.12
0.5531
KL-Cov
Finetuning status=Fine...
2025.12
0.5474
Vanilla GRPO
Finetuning status=Fine...
2025.12
0.547
NoisyRollout
Finetuning status=Fine...
2025.12
0.5368
Qwen2.5-VL-7B-Instruct
Finetuning status=Not...
2025.12
0.4999
Feedback
Search any
task
Search any
task