Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Reasoning on Geo3K

56.74Accuracy

Qwen2.5-VL-32B-Instruct + NoisyRollout

25.248833.424441.649.7756Mar 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
56.74
2026.03
53.58
2026.03
51.75
2026.03
51.4
2026.03
50.58
2026.03
50.08
2026.03
48.09
2026.03
39.77
2026.03
32.95
2026.03
26.46