Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-conditional Image Retrieval on Stanford40
Loading...
72.2
Action Accuracy
CLAY
39.232
47.791
56.35
64.909
Apr 13, 2026
Action Accuracy
Location Accuracy
Mood Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Action Accuracy
Location Accuracy
Mood Accuracy
CLAY
Base Model=SigLIP-L, B...
2026.04
72.2
60
62
CLAY
Base Model=CLIP-L, Bac...
2026.04
68.9
55.5
63.2
CLAY
Backbone=SigLIP-B
2026.04
66.2
59.5
58.7
CLAY
Backbone=CLIP-B
2026.04
66
55.4
57.9
InstructBLIP
2026.04
63.1
54.4
60.3
SigLIP-L
Backbone=ViT-L
2026.04
56.5
51.3
56.7
SigLIP-B
Backbone=SigLIP-B
2026.04
54.8
52.7
56.4
MagicLens
2026.04
52.6
47.5
55.4
GeneCIS
2026.04
50
50.9
51.8
CLIP-L
Backbone=ViT-L
2026.04
45.3
44.1
54
CLIP-B
Backbone=CLIP-B
2026.04
43
47
53
SEARLE
2026.04
40.5
43.2
50
Feedback
Search any
task
Search any
task