| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MFA-Labeled Raw (test) | Qwen3-ForcedAligner-0.6B | AAS Latency (Avg)42.9 | 8 | 3mo ago | |
| LibriSpeech Other | WhisperX | AAS96.64 | 5 | 2d ago | |
| LibriSpeech Clean | AAS87.05 | 5 | 2d ago | ||
| GTSinger-Speech-ZH | WhisperX | AAS221.29 | 5 | 2d ago | |
| MFA-labeled Long-form (test) | LLM-ForcedAligner | Average Alignment Value52.9 | 4 | 3mo ago | |
| Human-Labeled (test) | Avg. RTF0.0067 | 4 | 3mo ago | ||
| MFA-Labeled Concat-300s (test) | Qwen3-ForcedAligner-0.6B | AAS (Avg) [ms]52.9 | 4 | 3mo ago | |
| human-labeled Chinese datasets (Mixed-300s) | Monotonic-Aligner | AAS410.8 | 3 | 3mo ago | |
| human-labeled Chinese datasets (Mixed-60s) | AAS86.7 | 3 | 3mo ago | ||
| human-labeled Chinese datasets (Raw-Noisy) | AAS0.895 | 3 | 3mo ago | ||
| human-labeled Chinese datasets (Raw) | AAS88.6 | 3 | 3mo ago | ||
| (test) | M26 | Mean Absolute Error (ms)15.9 | 2 | 1mo ago | |
| (val) | M26 | Mean Absolute Error (ms)13.39 | 2 | 1mo ago | |
| (train) | M26 | Mean Abs Error (ms)13.64 | 2 | 1mo ago | |
| Randomly selected audio files and transcriptions Manual Inspection | FASA | AU Count81 | 2 | 3mo ago | |
| human-labeled Chinese and MFA-labeled multilingual speech Mixed-Crosslingual | LLM-ForcedAligner | AAS42.5 | 1 | 3mo ago |