Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gloria: Consistent Character Video Generation via Content Anchors

About

Digital characters are central to modern media, yet generating character videos with long-duration, consistent multi-view appearance and expressive identity remains challenging. Existing approaches either provide insufficient context to preserve identity or leverage non-character-centric information as the memory, leading to suboptimal consistency. Recognizing that character video generation inherently resembles an outside-looking-in scenario. In this work, we propose representing the character visual attributes through a compact set of anchor frames. This design provides stable references for consistency, while reference-based video generation inherently faces challenges of copy-pasting and multi-reference conflicts. To address these, we introduce two mechanisms: Superset Content Anchoring, providing intra- and extra-training clip cues to prevent duplication, and RoPE as Weak Condition, encoding positional offsets to distinguish multiple anchors. Furthermore, we construct a scalable pipeline to extract these anchors from massive videos. Experiments show our method generates high-quality character videos exceeding 10 minutes, and achieves expressive identity and appearance consistency across views, surpassing existing methods.

Yuhang Yang, Fan Zhang, Huaijin Pi, Shuai Guo, Guowei Xu, Wei Zhai, Yang Cao, Zheng-Jun Zha• 2026

Related benchmarks

TaskDatasetResultRank
Talking Head GenerationFoundation capability evaluation set
IQA4.65
7
Multi-view appearance and expressive identity consistencyMulti-view appearance and expressive identity consistency (evaluation set)
DINO-I Score82.1
6
Long-term Video ConsistencyVBench
Sub. Consistency Score96
4
Showing 3 of 3 rows

Other info

Follow for update