Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fine-grained Image-Text Alignment on FG-OVD
Loading...
19.24
Accuracy (Hard)
MulCLIP
16.4528
17.1764
17.9
18.6236
Dec 8, 2025
Accuracy (Hard)
Accuracy (Medium)
Accuracy (Easy)
Accuracy (Trivial)
Average Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (Hard)
Accuracy (Medium)
Accuracy (Easy)
Accuracy (Trivial)
Average Accuracy
MulCLIP
Backbone=ViT-B/16, Fin...
2025.12
19.24
40.73
47.57
68.89
44.11
GOAL
Backbone=ViT-B/16, Fin...
2025.12
18.65
39.66
44.5
72.78
43.9
FineLIP
Backbone=ViT-B/16, Fin...
2025.12
18.17
38.68
41.96
73.79
43.15
W/o WPR
Backbone=ViT-B/16, Fin...
2025.12
17.38
38.51
45.42
68.41
42.43
W/o SAP
Backbone=ViT-B/16, Fin...
2025.12
16.56
37.84
43.03
65.84
40.82
Feedback
Search any
task
Search any
task