| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-frame visual story generation | ConsiStory+ | CLIP-T90.74 | 12 | |
| Consistent Text-to-Image Generation | ConsiStory+ evaluation prompts | Human Preference Rate0.23 | 8 | |
| Consistent Text-to-Image Generation | ConsiStory+ | Wins42.67 | 5 | |
| Story Generation | ConsiStory-Human 1.0 (test) | CLIP-T Score35.5 | 5 |