WebVid

Benchmarks

Task Name	Dataset Name	SOTA Result
Composed Video Retrieval	WebVid-CoVR (test)	R@15,982	86
Video Reconstruction	WebVid 10M	PSNR35.76	45
Video Reconstruction	Webvid (val)	PSNR34.75	16
Video Captioning Evaluation	WebVid 10M	CLIP Score61.377	12
Video Annotation	WebVid-10M	Avg Length214.49	12
Video reconstruction	WebVid-10M (val)	VCPR492.5	10
Video Generation	WebVid mini (val)	FVD @ 1 Frame526	10
Trajectory-based image animation	WebVid (test)	LPIPS0.1562	8
Video Generation	WebVid (test)	LPIPS0.135	7
Watermark Extraction	WebVid 1000 videos 10M	Average Frame Score (N=3)99.98	6
Video Watermarking Visual Quality	WebVid 10M	FVD361.3	6
Camera-controlled video generation	WebVid	RotErr3.162	5
Video Generation	webvid (test)	SubC92.8	5
Semantic Consistency	WebVid10M	CLIP-F0.93	5
Text-to-Video Generation	WebVid 400 samples (val)	CLIP Score0.308	4
Cross-View Video Generation	WebVid 200 monocular videos (test)	Subjective Consistency92.18	4
Image-to-Video Generation	WebVid (test)	FID29.94	4
Text-to-Video Generation	WebVid (test)	FID61.52	4
Video Editing	WebVid	PSNR33.07	4
Image-to-Video generation	WebVid 10M	Temporal Coherence96.9	4
Text-to-Video Generation	WebVid-10M 2-million	CLIP Score48.3	4
Text-to-Video Generation	WebVid-10M (val)	FVD292.35	4
Image-to-Video generation with fine-grained motion control	WebVid-10M 1k (val)	FVD59.88	3
Multi-subject and motion customization	WebVid subject pairs (test)	CLIP Text Alignment Score0.662	3
Geometry Consistency	WebVid10M	Rot. AUC @ 5°25.2	3

Showing 25 of 31 rows