Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Vision-Language Reasoning on SQA3D ScanNet scenes (test)
Loading...
54.9
BLEU-1
DINOv3 + SpatialBoost
47.724
49.587
51.45
53.313
Mar 23, 2026
BLEU-1
Updated 25d ago
Evaluation Results
Method
Method
Links
BLEU-1
DINOv3 + SpatialBoost
Base Model=DINOv3, Spa...
2026.03
54.9
PE-Core
Encoder Type=Vision-La...
2026.03
51.7
DINOv3
2026.03
51.4
dino.txt
Encoder Type=Vision-La...
2026.03
50.4
DINOv2 + SpatialBoost
Base Model=DINOv2, Spa...
2026.03
50.4
SigLIPv2 + SpatialBoost
Base Model=SigLIPv2, S...
2026.03
50.1
OpenCLIP + SpatialBoost
Base Model=OpenCLIP, S...
2026.03
49.9
DINOv2
2026.03
49.8
TIPS
Encoder Type=Vision-La...
2026.03
49.2
V-JEPAv2
Encoder Type=Vision-on...
2026.03
49
SigLIPv2
2026.03
48.5
AIMv2
Encoder Type=Vision-La...
2026.03
48.1
OpenCLIP
2026.03
48
Feedback
Search any
task
Search any
task