Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Image-Text Retrieval on MSCOCO
Loading...
82.8
Accuracy
SigLIP2-ViT-B/16
73.544
75.947
78.35
80.753
Mar 26, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
SigLIP2-ViT-B/16
Backbone=ViT-B/16
2026.03
82.8
C^2LIP
Backbone=ViT-B/16
2026.03
82.7
SigLIP ViT-L/16-res256
Backbone=ViT-L/16, Inp...
2026.03
82.3
CLIPS (ViT-L/14)
Backbone=ViT-L/14
2026.03
82.1
BLIP-B
Backbone=ViT-B
2026.03
79.1
CLIP-A (ViT-L/14)
Backbone=ViT-L/14
2026.03
78.1
FLAVA
Backbone=ViT-B
2026.03
73.9
Feedback
Search any
task
Search any
task