| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio-Visual Classification | CREMA-D (test) | Accuracy79.7 | 60 | |
| Speech Emotion Recognition | CREMA-D (test) | Accuracy68.12 | 24 | |
| Emotion Recognition | CREMA-D | Accuracy (6)68.4 | 23 | |
| Discrete Emotion Recognition | CREMA-D 18 (test) | Accuracy55.01 | 19 | |
| Emotion Classification | CREMA-D | F1 (Macro)77.9 | 18 | |
| Emotion Recognition | CREMA-D 6-class | WAR79.36 | 17 | |
| Audio Classification | CREMA-D 6 | Top-1 Accuracy43.3 | 15 | |
| Mixed-emotion Text-to-Speech | CREMA-D (in-distribution) | Embedding Similarity (E-SIM)0.795 | 15 | |
| Audio Classification | Crema-D | Accuracy73 | 15 | |
| Multimodal Classification | CREMA-D (test) | Multi Accuracy77.6 | 14 | |
| Categorical Emotion Recognition | CREMA-D | UAR85.71 | 14 | |
| Multimodal Classification | CREMA-D | Accuracy77.92 | 12 | |
| Speech Emotion Recognition | CREMA-D 6 classes (test) | Weighted Accuracy (WA)75.2 | 12 | |
| Emotion Recognition | CREMA-D | WA (Weighted Average)56 | 12 | |
| Speech Emotion Recognition | CREMA-D | Weighted Accuracy95.24 | 12 | |
| Audio emotion recognition | CREMA-D | Accuracy70.47 | 11 | |
| Classification | CREMA-D (test) | Accuracy75.17 | 10 | |
| Audio Classification | CREMA-D (test) | Accuracy45.06 | 9 | |
| Talking Face Generation | CREMA-D | FID5.29 | 9 | |
| Talking Face Generation | CREMA-D (test) | SSIM1 | 8 | |
| Dynamic Facial Expression Recognition | CREMA-D 6-class (test) | WAR85.03 | 8 | |
| Emotion Distribution Estimation | CREMA-D | NLL (MA)0.606 | 7 | |
| Classification | CREMA-D | Audio Accuracy61.11 | 6 | |
| Age Classification | CREMA-D | WA40.6 | 5 | |
| Emotion recognition | CREMA-D (test) | Accuracy75.71 | 5 |