| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Single-object 4D Motion Generation | User Study Single-object 4D Motion Generation 1.0 (test) | Prompt Alignment47 | 36 | |
| Image Editing | User Study 100 images (test) | User Selection Rate94.3 | 32 | |
| Image Style Transfer | User Study | Overall Quality Score83.9 | 30 | |
| Talking head synthesis | User Study | Lip Sync Quality4.46 | 18 | |
| Qualitative Interface Comparison | User Study (N=24) (between-subjects) | Mentions10 | 17 | |
| Image Personalization | User Study Personalization Tasks | Concept Preservation (CP)95.3 | 17 | |
| Task-Oriented Robot-Human Handover | User Study Franka Panda | Failure Rate37 | 16 | |
| Image Inpainting | User Study 40 random images (test) | UOM1.6 | 12 | |
| Text Alignment | User Study | Average Ranking1.54 | 12 | |
| Facial Reconstruction | User Study | ID Consistency4.85 | 10 | |
| Subjective Image Quality Assessment | User Study (test) | Average Rank1.17 | 10 | |
| Style Transfer | User Study 10 content images, 8 style images (test) | Style Score54.6 | 9 | |
| Visual Dubbing | User Study | Realism4.4 | 9 | |
| Character Animation | User Study 20 identities and 20 driving videos (test) | Video Quality0.9 | 9 | |
| Human Video Generation | User Study | Motion Quality30.26 | 8 | |
| Indoor Scene Synthesis | User Study | Visual Quality4.1 | 8 | |
| Identity-consistent video generation | User Study 15 identities | Face Similarity Score3.837 | 8 | |
| Speech-Preserving Facial Expression Manipulation | User Study (test) | Realism65 | 8 | |
| Geometric Image Editing | User Study 1.0 (test) | Move Count384 | 8 | |
| Text-to-3D | User Study 68 text-to-3D cases Human Evaluation | Selection Count905 | 8 | |
| Coarse-grained attribute binding | User Study 10 prompts (test) | User Preference Frequency92.6 | 8 | |
| Video Generation | User study (test) | Video Quality Score49.23 | 8 | |
| HDR content generation | User Study (test) | User Preference Score94.52 | 8 | |
| Image-to-3D Generation | User Study (test) | Multi-view Consistency9.26 | 8 | |
| 3D Human Generation | User Study 30 prompts | Q1 Best Preference Rate78 | 8 |