| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Automatic Speech Recognition | mTEDx Es-En v1.0 (test) | ASR chrF58.8 | 7 | |
| Cross-lingual Text-to-Speech | mTEDx (test) | Naturalness MOS4.11 | 4 | |
| Waveform generation | mTEDx (test) | LSD1.01 | 3 | |
| Audio-Visual Speech-to-Speech Translation | mTEDx X-En | Audio Quality (AQ)3.97 | 2 |