| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-driven Talking Head Generation | RAVDESS (cross-identity) | FAD1.885 | 48 | |
| Talking Head Generation | RAVDESS intra-identity 1.0 | FAD0.833 | 48 | |
| Audio-Driven Facial Animation | RAVDESS 42 (test) | PSNR30.772 | 24 | |
| Emotion Recognition | RAVDESS (val) | Accuracy97.46 | 20 | |
| Emotion Recognition | RAVDESS 7-class | WAR83.61 | 19 | |
| Speech Emotion Recognition | RAVDESS | Weighted Accuracy92.08 | 19 | |
| Emotion Recognition | RAVDESS | Accuracy72 | 19 | |
| Discrete Emotion Recognition | Ravdess 19 (test) | Accuracy44.04 | 19 | |
| Emotion Recognition | RAVDESS (test) | Accuracy0.9735 | 17 | |
| Audio Classification | RAVDESS | Base Accuracy63.55 | 13 | |
| Speech Emotion Recognition | RAVDESS 8 classes (test) | Weighted Accuracy84.72 | 12 | |
| Speech Emotion Recognition | RAVDESS In-Domain v1 (test) | Accuracy85.74 | 12 | |
| Song Emotion Recognition | RAVDESS Song | Weighted Accuracy85.8 | 11 | |
| Audiovisual Emotion Recognition | RAVDESS | Accuracy (AV)81.58 | 11 | |
| Emotion Recognition | RAVDESS (6-fold cross-val) | Accuracy74.86 | 9 | |
| Speech Emotion Recognition | RAVDESS (6-fold subject-independent cross-validation) | Weighted Accuracy (WA)93.4 | 8 | |
| Facial Emotion Recognition | RAVDESS | WAR87.99 | 8 | |
| Dynamic Facial Expression Recognition | RAVDESS 7-class | WAR83.69 | 8 | |
| Audio Classification | RAVDESS (test) | Accuracy0.4596 | 7 | |
| Self Reenactment | RAVDESS | PSNR26.5507 | 6 | |
| Emotion Recognition | RAVDESS (speaker-independent) | Accuracy51.7 | 6 | |
| Portrait Image Animation | RAVDESS | Sync-C5.223 | 6 | |
| Gender Classification | RAVDESS | Weighted Accuracy100 | 5 | |
| Affective Computing | RAVDESS 8cl/en (test) | UAR22.8 | 4 | |
| Audio Classification | RAVDESS (10-fold cross-val) | Accuracy75.4 | 4 |