Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VideoAgent: Personalized Synthesis of Scientific Videos

About

The technical complexity of research papers often limits their reach, necessitating more accessible formats like scientific videos to disseminate key insights through engaging narration. However, existing automated methods primarily focus on static posters or slide presentations that remain template-bound and linear. Shifting to audience-adaptive video synthesis requires addressing non-linear narrative orchestration and the joint synchronization of disparate multimodal assets. We introduce VideoAgent, a modular framework that redefines scientific video synthesis as an intent-driven planning problem. By decoupling content understanding from multimodal synthesis, VideoAgent adaptively interleaves static slides with dynamic animations to match the semantic density of the narration. We further propose SciVidEval, a benchmark evaluating multimodal quality and pedagogical utility through automated metrics and human knowledge transfer studies. Extensive experiments demonstrate that VideoAgent effectively conveys complex technical logic with high narrative fidelity and communicative impact.

Xiao Liang, Bangxin Li, Zixuan Chen, Hanyue Zheng, Zhi Ma, Di Wang, Cong Tian, Quan Wang• 2025

Related benchmarks

TaskDatasetResultRank
Video-Quiz EvaluationSciVidEval
VLM-as-Judge Score99.5
10
Visual Quality EvaluationSciVidEval
VLM-as-Judge Score8.03
9
Narration Quality EvaluationSciVidEval
Perplexity (PPL)18.08
8
Synchronization EvaluationSciVidEval
CLIP Score0.635
7
Showing 4 of 4 rows

Other info

Follow for update