Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Vision-Language Reasoning on ScanQA ScanNet scenes (test)
Loading...
43.3
BLEU-1
DINOv3 + SpatialBoost
36.644
38.372
40.1
41.828
Mar 23, 2026
BLEU-1
Updated 25d ago
Evaluation Results
Method
Method
Links
BLEU-1
DINOv3 + SpatialBoost
Base Model=DINOv3, Eva...
2026.03
43.3
SigLIPv2 + SpatialBoost
Base Model=SigLIPv2, E...
2026.03
40.8
DINOv3
Evaluation=Unified pro...
2026.03
40.6
PE-Core
Encoder Type=Vision-La...
2026.03
40.5
DINOv2 + SpatialBoost
Base Model=DINOv2, Eva...
2026.03
40.3
dino.txt
Encoder Type=Vision-La...
2026.03
39.8
DINOv2
Evaluation=Unified pro...
2026.03
39.5
OpenCLIP + SpatialBoost
Base Model=OpenCLIP, E...
2026.03
39.2
SigLIPv2
Evaluation=Unified pro...
2026.03
38.1
V-JEPAv2
Encoder Type=Vision-on...
2026.03
37.7
AIMv2
Encoder Type=Vision-La...
2026.03
37.4
TIPS
Encoder Type=Vision-La...
2026.03
37.4
OpenCLIP
Evaluation=Unified pro...
2026.03
36.9
Feedback
Search any
task
Search any
task