Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement

About

Fully immersive experiences that tightly integrate 6-DoF visual and auditory interaction are essential for virtual and augmented reality. While such experiences can be achieved through computer-generated content, constructing them directly from real-world captured videos remains largely unexplored. We introduce Immersive Volumetric Videos, a new volumetric media format designed to provide large 6-DoF interaction spaces, audiovisual feedback, and high-resolution, high-frame-rate dynamic content. To support IVV construction, we present ImViD, a multi-view, multi-modal dataset built upon a space-oriented capture philosophy. Our custom capture rig enables synchronized multi-view video-audio acquisition during motion, facilitating efficient capture of complex indoor and outdoor scenes with rich foreground--background interactions and challenging dynamics. The dataset provides 5K-resolution videos at 60 FPS with durations of 1-5 minutes, offering richer spatial, temporal, and multimodal coverage than existing benchmarks. Leveraging this dataset, we develop a dynamic light field reconstruction framework built upon a Gaussian-based spatio-temporal representation, incorporating flow-guided sparse initialization, joint camera temporal calibration, and multi-term spatio-temporal supervision for robust and accurate modeling of complex motion. We further propose, to our knowledge, the first method for sound field reconstruction from such multi-view audiovisual data. Together, these components form a unified pipeline for immersive volumetric video production. Extensive benchmarks and immersive VR experiments demonstrate that our pipeline generates high-quality, temporally stable audiovisual volumetric content with large 6-DoF interaction spaces. This work provides both a foundational definition and a practical construction methodology for immersive volumetric videos.

Zhengxian Yang, Shengqi Wang, Shi Pan, Hongshuai Li, Haoxiang Wang, Lin Li, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu• 2026

Related benchmarks

TaskDatasetResultRank
Dynamic Light Field ReconstructionImViD 300 frames per scene (test)
PSNR33.51
25
Dynamic Light Field ReconstructionGoogle Immersive (test)
PSNR32.48
20
Dynamic Light Field ReconstructionMPEG-GSC views (test)
PSNR36.16
15
Dynamic Light Field ReconstructionMeetRoom (Discussion)
PSNR35.01
5
Dynamic Light Field ReconstructionMeetRoom Trimming
PSNR33.33
5
Dynamic Light Field ReconstructionMeetRoom VRheadset
PSNR32.08
5
Dynamic Light Field ReconstructionMeetRoom Average
PSNR33.47
5
Dynamic Light Field ReconstructionImViD Scene 1 Opera (test)
PSNR33.51
5
Dynamic Light Field ReconstructionImViD Scene 2 Laboratory (test)
PSNR31.1
5
Dynamic Light Field ReconstructionImViD Scene 5 Rendition (test)
PSNR27.84
5
Showing 10 of 13 rows

Other info

Follow for update