| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Classification | Expresso emo | Top-1 Accuracy70.1 | 15 | |
| Response Appropriateness | Expresso | Response Appropriateness77 | 9 | |
| Sentiment Classification | Expresso | Accuracy74 | 9 | |
| Voice conversion | Expresso OOD | F0 Correlation0.543 | 6 | |
| In-context Text-to-Speech | Expresso Expr | Sim-o0.603 | 3 | |
| Text-to-Speech | Expresso Expr | Metric- | 0 |