| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Rebuttal Generation | Human Evaluation Set (100 comments) 1.0 (test) | Attitude Score9.92 | 8 | |
| Query Auto-Completion | Human Evaluation Set | Item-wise Score69.9 | 4 | |
| Style Customization | Human evaluation set Generated texts (test) | Content Score79 | 4 | |
| Simultaneous Speech-to-Speech Translation | Human Evaluation Set French short-form | Audio Quality MOS64.5 | 3 | |
| Video Generation | Human evaluation set 15 videos (test) | Image Prompt Alignment4.4 | 3 | |
| Single-Attribute Controlled Text Generation | Human Evaluation Set | Quality Score4.2 | 3 | |
| Machine Translation | Human Evaluation set en-pt 1.0 (test) | Gender Agreement2.78 | 3 | |
| Machine Translation | Human Evaluation set en-pl 1.0 (test) | Gender Agreement2.64 | 3 | |
| Machine Translation | Human Evaluation set en-ja 1.0 (test) | Gender Agreement2.97 | 3 | |
| Machine Translation | Human Evaluation set en-hi 1.0 (test) | Gender Agreement2.89 | 3 | |
| Machine Translation | Human Evaluation set en-fr 1.0 (test) | Gender Agreement2.96 | 3 | |
| Machine Translation | Human Evaluation set en-ar 1.0 (test) | Gender Agreement2.79 | 3 | |
| Simultaneous Speech-to-Speech Translation | Human Evaluation Set German short-form | Audio Quality73.5 | 2 | |
| Simultaneous Speech-to-Speech Translation | Human Evaluation Set Portuguese short-form | Audio Quality62 | 2 | |
| Simultaneous Speech-to-Speech Translation | Human Evaluation Set Spanish short-form | Audio Quality (MOS)66.8 | 2 |