DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

About

Implicit neural representations for video (NeRV) have recently become a novel way for high-quality video representation. However, existing works employ a single network to represent the entire video, which implicitly confuse static and dynamic information. This leads to an inability to effectively compress the redundant static information and lack the explicitly modeling of global temporal-coherent dynamic details. To solve above problems, we propose DS-NeRV, which decomposes videos into sparse learnable static codes and dynamic codes without the need for explicit optical flow or residual supervision. By setting different sampling rates for two codes and applying weighted sum and interpolation sampling methods, DS-NeRV efficiently utilizes redundant static information while maintaining high-frequency details. Additionally, we design a cross-channel attention-based (CCA) fusion module to efficiently fuse these two codes for frame decoding. Our approach achieves a high quality reconstruction of 31.2 PSNR with only 0.35M parameters thanks to separate static and dynamic codes representation and outperforms existing NeRV methods in many downstream tasks. Our project website is at https://haoyan14.github.io/DS-NeRV.

Hao Yan, Zhihui Ke, Xiaobo Zhou, Tie Qiu, Xidong Shi, Dadong Jiang• 2024

Related benchmarks

Task	Dataset	Result
Video Compression	UVG	--	55
Video Inpainting	DAVIS (test)	PSNR35.3	54
Video Reconstruction	Bunny	PSNR38.65	34
Video Reconstruction	DAVIS	--	29
Video Reconstruction	UVG (test)	Beauty Score33.97	20
Video Inpainting	DAVIS (central mask)	b-swan Score28.33	8
Video Reconstruction	UVG 600 frames	Decoding Speed (FPS)63.54	8
Video Decoding	UVG	FPS63.54	4
Video Inpainting	DAVIS disperse mask (test)	b-swan32.28	4
Video Reconstruction	UVG 1080p standard	Beauty Score33.29	4

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord