PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
About
Realistic simulation of dynamic scenes requires accurately capturing diverse material properties and modeling complex object interactions grounded in physical principles. However, existing methods are constrained to basic material types with limited predictable parameters, making them insufficient to represent the complexity of real-world materials. We introduce PhysFlow, a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation. Our method utilizes multi-modal models to identify material types and initialize material parameters through image queries, while simultaneously inferring 3D Gaussian splats for detailed scene representation. We further refine these material parameters using video diffusion with a differentiable Material Point Method (MPM) and optical flow guidance rather than render loss or Score Distillation Sampling (SDS) loss. This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios, advancing both accuracy and flexibility in physics-based simulations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| System Identification | Synthetic dataset | RE1 | 50 | |
| System Identification | Synthetic dataset | Rel Error (delta_mu)0.004 | 12 | |
| Physically-grounded Video Generation | Physically-grounded Video Evaluation Set Human Designed, Real World, and AI Generated scenes | OC17.96 | 5 | |
| Physical motion simulation | Real-world dataset (test) | ECMS3.08 | 4 | |
| Physical Simulation Realism | Real-world dataset (PhysDreamer, Instant-NGP, NeRFStudio, Mip-NeRF 360, etc.) (test) | Physical Realism3.44 | 4 | |
| Motion Simulation | PhyGenBench PhysGen scenes | ECMS0.85 | 3 |