Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text-to-Audio Retrieval (via Vision pivot) on AVE to MSRVTT (test)
Loading...
34.1
mAP
IB+BB
4.148
11.924
19.7
27.476
Feb 6, 2026
mAP
Updated 4d ago
Evaluation Results
Method
Method
Links
mAP
IB+BB
Backbone=ImageBind, Va...
2026.02
34.1
LB+BB
Backbone=LanguageBind,...
2026.02
33.4
IB+BB
Backbone=ImageBind, Va...
2026.02
25.6
LB+BB
Backbone=LanguageBind,...
2026.02
24.8
LB+FT
Backbone=LanguageBind,...
2026.02
14.1
IB+FT
Backbone=ImageBind, Va...
2026.02
11
LB+FT
Backbone=LanguageBind,...
2026.02
10.6
LB
Backbone=LanguageBind,...
2026.02
10.1
IB
Backbone=ImageBind, Va...
2026.02
9.2
LB
Backbone=LanguageBind,...
2026.02
8.9
IB+FT
Backbone=ImageBind, Va...
2026.02
5.7
IB
Backbone=ImageBind, Va...
2026.02
5.3
Feedback
Search any
task
Search any
task