| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Controllable video generation | Cooking 50 real-world videos (test) | PSNR16.44 | 6 | |
| Interleaved generation | Cooking-200 Text Input | T-Com4.02 | 5 | |
| Interleaved generation | Cooking-200 | T-Com4.24 | 5 | |
| Cross-task Generalization | Cooking (test) | Similarity0.6889 | 4 | |
| Action Alignment | Cooking 2 (test) | Midpoint Score10.6 | 4 | |
| Human Preference Evaluation | Cooking | Step Faithfulness Win Rate94 | 3 |