Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Image-to-Text Retrieval on COCO 2017 (val)
Loading...
38.78
Recall@1
CLIP-Refine
31.5832
33.4516
35.32
37.1884
Apr 17, 2025
Recall@1
Recall@5
Recall@10
Updated 4d ago
Evaluation Results
Method
Method
Links
Recall@1
Recall@5
Recall@10
CLIP-Refine
Backbone=ViT-B/32, Zer...
2025.04
38.78
65.04
75.12
HyCD
Backbone=ViT-B/32, Zer...
2025.04
37.88
62.54
72.84
Self-KD
Backbone=ViT-B/32, Zer...
2025.04
35.36
59
69.72
HyCD + Lalign
Backbone=ViT-B/32, Zer...
2025.04
35.14
61.74
72.06
Pre-trained (CLIP)
Backbone=ViT-B/32, Zer...
2025.04
33.26
59.1
68.78
m²-mix
Backbone=ViT-B/32, Zer...
2025.04
32.92
57.96
67.56
Contrastive
Backbone=ViT-B/32, Zer...
2025.04
31.86
56.8
67.48
Feedback
Search any
task
Search any
task