EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
About
Recent advances in foundation models highlight a clear trend toward unification and scaling, showing emergent capabilities across diverse domains. While image generation and editing have rapidly transitioned from task-specific to unified frameworks, video generation and editing remain fragmented due to architectural limitations and data scarcity. In this work, we introduce EditVerse, a unified framework for image and video generation and editing within a single model. By representing all modalities, i.e., text, image, and video, as a unified token sequence, EditVerse leverages self-attention to achieve robust in-context learning, natural cross-modal knowledge transfer, and flexible handling of inputs and outputs with arbitrary resolutions and durations. To address the lack of video editing training data, we design a scalable data pipeline that curates 232K video editing samples and combines them with large-scale image and video datasets for joint training. Furthermore, we present EditVerseBench, the first benchmark for instruction-based video editing covering diverse tasks and resolutions. Extensive experiments and user studies demonstrate that EditVerse achieves state-of-the-art performance, surpassing existing open-source and commercial models, while exhibiting emergent editing and generation abilities across modalities.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Editing | ImgEdit-Bench | Overall Score3.42 | 224 | |
| Image Generation | GenEval (test) | GenEval Score82 | 48 | |
| Video Editing | TGVE benchmark | ViCLIPdir22.5 | 20 | |
| Video Editing | V2VBench | Frames Quality4.957 | 17 | |
| Video Generation | VBench | Total Score80.97 | 13 | |
| Video Editing | EditVerseBench Appearance (test) | Pick Score20.06 | 12 | |
| Video Editing | EditVerse latest (full) | Editing Quality7.64 | 11 | |
| Video Editing | EditVerseBench 125 videos | CLIP Score98.6 | 11 | |
| Video Editing | EditVerseBench (test) | Quality Score7.65 | 8 | |
| Video Editing | EditVerse-Bench 120-case source-video | Overall Score7.52 | 8 |