Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Temporal Grounding (Human Vocalization) on Audioset
Loading...
6.6
Acc (40ms)
Gemini 2.5 Flash
0.984
2.442
3.9
5.358
Feb 10, 2026
Acc (40ms)
Acc (100ms)
MAD
Updated 1mo ago
Evaluation Results
Method
Method
Links
Acc (40ms)
Acc (100ms)
MAD
Gemini 2.5 Flash
Inference Mode=Zero-shot
2026.02
6.6
12.7
2.3
Audio Flamingo 3
Inference Mode=Zero-shot
2026.02
5.8
10.2
1.69
Qwen2.5 7B
Inference Mode=Zero-shot
2026.02
3.9
5.7
3.52
Qwen2.5 3B
Inference Mode=Zero-shot
2026.02
3.6
4.5
4.46
Voxtral 24B
Inference Mode=Zero-shot
2026.02
2.3
4.2
4.03
GPT-4o Audio
Inference Mode=Zero-shot
2026.02
1.9
2.8
3.8
Voxtral 3B
Inference Mode=Zero-shot
2026.02
1.2
2.2
3.74
Feedback
Search any
task
Search any
task