Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Temporal Grounding (Speaker Diarization) on Libricount
Loading...
2.3
Accuracy (40ms Tol)
Audio Flamingo 3
0.116
0.683
1.25
1.817
Feb 10, 2026
Accuracy (40ms Tol)
Accuracy (100ms Tol)
MAD
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (40ms Tol)
Accuracy (100ms Tol)
MAD
Audio Flamingo 3
Inference Mode=Zero-shot
2026.02
2.3
6.3
1.53
Gemini 2.5 Flash
Inference Mode=Zero-shot
2026.02
1.1
2.3
2.69
Qwen2.5 3B
Inference Mode=Zero-shot
2026.02
0.6
1.9
3.71
Voxtral 24B
Inference Mode=Zero-shot
2026.02
0.4
1.4
4.91
Qwen2.5 7B
Inference Mode=Zero-shot
2026.02
0.4
1.5
5.29
Voxtral 3B
Inference Mode=Zero-shot
2026.02
0.3
1.2
6.02
GPT-4o Audio
Inference Mode=Zero-shot
2026.02
0.2
1.1
4.6
Feedback
Search any
task
Search any
task