DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

About

Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.

Qi Zhao, M. Salman Asif, Zhan Ma• 2023

Related benchmarks

Task	Dataset	Result
Video Reconstruction	Bunny	PSNR34.09	34
Video Reconstruction	DAVIS	PSNR29.66	33
Video Regression	UVG	Beauty40	20
Video Reconstruction	UVG (test)	Beauty Score33.16	20
Neural Video Representation	Video per-frame	GFLOPs181	12
Video Representation	DAVIS	PSNR (Average)30.39	11
Neural Video Representation	Literature Comparison	GFLOPs181	10
Video Inpainting	DAVIS (central mask)	b-swan Score26.47	8
Video Reconstruction	UVG 600 frames	Decoding Speed (FPS)52.2	8
Video Inpainting	DAVIS 960 × 1920	Bmx-B25.7	6

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord