Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Temporal Grounding (Word Alignment) on Librispeech
Loading...
7.7
Accuracy (20ms tolerance)
Gemini 2.5 Flash
-0.204
1.848
3.9
5.952
Feb 10, 2026
Accuracy (20ms tolerance)
Accuracy (40ms tolerance)
MAD
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (20ms tolerance)
Accuracy (40ms tolerance)
MAD
Gemini 2.5 Flash
Inference Mode=Zero-shot
2026.02
7.7
13.8
0.28
Voxtral 3B
Inference Mode=Zero-shot
2026.02
0.8
1.1
2.52
Qwen2.5 7B
Inference Mode=Zero-shot
2026.02
0.7
1.1
2.17
Audio Flamingo 3
Inference Mode=Zero-shot
2026.02
0.6
1.4
1.82
GPT-4o Audio
Inference Mode=Zero-shot
2026.02
0.3
0.5
2.42
Voxtral 24B
Inference Mode=Zero-shot
2026.02
0.2
0.5
2.12
Qwen2.5 3B
Inference Mode=Zero-shot
2026.02
0.1
0.2
2.59
Feedback
Search any
task
Search any
task