L4GM: Large 4D Gaussian Reconstruction Model
About
We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 4D Mesh Reconstruction | Objaverse (test) | CD0.1107 | 13 | |
| 4D Generation | Consistent4D | CLIP Score0.908 | 10 | |
| 4D Synthesis | Monocular Video | FPS7.8 | 8 | |
| Video-to-4D reconstruction | 24 evaluation videos Emu, Sora, Veo, and ActivityNet (test) | LPIPS0.12 | 7 | |
| 4D Object Reconstruction | Objaverse-4D Synthesized scenes (24 examples) | Overall Quality65.4 | 6 | |
| 4D mesh generation | Truebones Zoo (test) | CD0.0809 | 6 | |
| 4D Generation | DAVIS 2019 (test) | Geometry Quality3.017 | 5 | |
| 4D Generation | Single Dynamic Object | Generation Time (min)3.5 | 5 | |
| 4D Object Reconstruction | DeformingThings (test) | CD0.2633 | 5 | |
| Novel View Synthesis | Objaverse | PSNR18.07 | 5 |