| AudioCaps (test) | KCL | Recall@166.59 | | 145 | 4d ago |
| Clotho (test) | AuroLA | R@128.3 | | 62 | 4d ago |
| VALOR | PE-AV | Recall@136.4 | | 24 | 2d ago |
| AudioCaps | InternVideo2-6B | Recall@155.2 | | 19 | 2d ago |
| Clotho T→A | PEAV S | Recall@124 | | 15 | 2d ago |
| Clotho V1 | InternVideo2-6B | R@125.3 | | 15 | 4d ago |
| Clotho V2 (test) | CLAP (Microsoft) | R@14.61 | | 13 | 4d ago |
| Clotho V2 | InternVideo2-6B | R@1 (%)27.2 | | 13 | 4d ago |
| Auto-ACD (test) | AuroLA | Recall@142.3 | | 10 | 4d ago |
| Clotho 1K 1.0 (test) | VAST | R@126.9 | | 10 | 4d ago |
| AudioCaps 1K 1.0 (test) | VAST | Recall@152 | | 10 | 4d ago |
| EPIC-Sounds | AuroLA | mAP17 | | 8 | 4d ago |
| HD-EPIC | AuroLA | mAP10.7 | | 8 | 4d ago |
| VGGSounder | AuroLA | mAP33.8 | | 8 | 4d ago |
| Soundscape Datasets (HSN, NES, SNE, UHH, PER, SSW) 2024 (OOD) | text-observation hashing framework | HSN Score40.69 | | 5 | 4d ago |
| iNatSounds 2024 (val) | BioLingual with 256-bit Hashing | mAP@1000 (Amphi)50.05 | | 5 | 4d ago |
| Clotho | CyCLAPS | R@10.1208 | | 4 | 4d ago |
| AVE | BrokenBind | Accuracy28.7 | | 3 | 4d ago |
| MSRVTT | BrokenBind | Accuracy37.1 | | 3 | 4d ago |
| Sound (test) | Speech-CLAP | Recall@113.33 | | 3 | 4d ago |
| Speech (test) | Speech-CLAP | R@17.1 | | 3 | 4d ago |
| UHH | BioLingual | mAP@100032 | | 2 | 4d ago |
| SSW | BioLingual-FT | mAP@100059.29 | | 2 | 4d ago |
| SNE | BioLingual-FT | mAP@100042.75 | | 2 | 4d ago |
| PER | BioLingual | mAP@10009.73 | | 2 | 4d ago |