MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention
About
A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. We introduce the Multi-Scale Patch Transformer (MSPT), an architecture that combines local point attention within patches with global attention to coarse patch-level representations. To partition the input domain into spatially-coherent patches, we employ ball trees, which handle irregular geometries efficiently. This dual-scale design enables MSPT to scale to millions of points on a single GPU. We validate our method on standard PDE benchmarks (elasticity, plasticity, fluid dynamics, porous flow) and large-scale aerodynamic datasets (ShapeNet-Car, Ahmed-ML), achieving state-of-the-art accuracy with substantially lower memory footprint and computational cost.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Operator learning | Airfoil Structured Mesh (test) | Relative L2 Error0.0051 | 15 | |
| Operator learning | Pipe Structured Mesh (test) | Relative L2 Error0.0031 | 15 | |
| Operator learning | Navier-Stokes Regular Grid (test) | Relative L2 Error0.0632 | 15 | |
| CFD field reconstruction | ShapeNet Car (test) | Volume Error1.89 | 15 | |
| Operator learning | Plasticity Structured Mesh (test) | Relative L2 Error0.001 | 15 | |
| Operator learning | Darcy Regular Grid (test) | Relative L2 Error0.0063 | 15 | |
| Operator learning | Elasticity Point Cloud (test) | Relative L2 Error0.0048 | 13 | |
| CFD field reconstruction | AhmedML (test) | Volume Metric2.04 | 11 |