Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Relational Reasoning on W-UP
Loading...
98.3
Accuracy (%)
THINKLITE-VL
84.676
88.213
91.75
95.287
May 22, 2026
Accuracy (%)
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy (%)
THINKLITE-VL
Method=THINKLITE-VL
2026.05
98.3
QWEN2.5-VL-7B + PGT
Backbone=QWEN2.5-VL-7B...
2026.05
98
INTERNVL3-8B + PGT
Backbone=INTERNVL3-8B,...
2026.05
97.9
IMAGE JIGSAW
Method=IMAGE JIGSAW
2026.05
97.4
INTERNVL3-8B
Backbone=INTERNVL3-8B,...
2026.05
97.2
QWEN2.5-VL-7B
Backbone=QWEN2.5-VL-7B...
2026.05
96.8
QWEN2.5-VL-7B + SPECIALIZED MIX
Backbone=QWEN2.5-VL-7B...
2026.05
96.4
VIGORL-3B
Method=VIGORL-3B
2026.05
96.2
QWEN2.5-VL-3B + PGT
Backbone=QWEN2.5-VL-3B...
2026.05
96.1
LLAVA-NEXT-LLAMA3-8B
Backbone=LLAVA-NEXT-LL...
2026.05
93.8
LLAVA-NEXT-LLAMA3-8B + PGT
Backbone=LLAVA-NEXT-LL...
2026.05
93.8
QWEN2.5-VL-3B
Backbone=QWEN2.5-VL-3B...
2026.05
93.8
QWEN2.5-VL-3B + SPECIALIZED MIX
Backbone=QWEN2.5-VL-3B...
2026.05
93.7
LLAVA-NEXT-7B + PGT
Backbone=LLAVA-NEXT-7B...
2026.05
90.7
SPATIAL-LADDER-3B
Method=SPATIAL-LADDER-3B
2026.05
88.8
LLAVA-NEXT-7B
Backbone=LLAVA-NEXT-7B...
2026.05
85.2
Feedback
Search any
task
Search any
task