Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training

About

Pre-training has proven effective in addressing data scarcity and performance limitations in solving PDE problems with neural operators. However, challenges remain due to the heterogeneity of PDE datasets in equation types, which leads to high errors in mixed training. Additionally, dense pre-training models that scale parameters by increasing network width or depth incur significant inference costs. To tackle these challenges, we propose a novel Mixture-of-Experts Pre-training Operator Transformer (MoE-POT), a sparse-activated architecture that scales parameters efficiently while controlling inference costs. Specifically, our model adopts a layer-wise router-gating network to dynamically select 4 routed experts from 16 expert networks during inference, enabling the model to focus on equation-specific features. Meanwhile, we also integrate 2 shared experts, aiming to capture common properties of PDE and reduce redundancy among routed experts. The final output is computed as the weighted average of the results from all activated experts. We pre-train models with parameters from 30M to 0.5B on 6 public PDE datasets. Our model with 90M activated parameters achieves up to a 40% reduction in zero-shot error compared with existing models with 120M activated parameters. Additionally, we conduct interpretability analysis, showing that dataset types can be inferred from router-gating network decisions, which validates the rationality and effectiveness of the MoE architecture.

Hong Wang, Haiyang Xin, Jie Wang, Xuanze Yang, Fei Zha, Huanshuo Dong, Yan Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Operator learningPDEBench DR
L2RE0.0096
28
Operator learningPDEBench SWE
L2 Relative Error (L2RE)0.0022
28
Operator learningPDEBench CNS (η=0.1, ζ=0.1)
L2 Relative Error (L2RE)0.0083
25
Operator learningPDEArena NS-cond
L2RE0.163
25
Operator learningFNO-ν (1e-4)
L2RE0.0119
25
Operator learningFNO-ν 1e-5
L2 Relative Error2.39
25
Operator learningPDEBench CNS (η=1, ζ=0.01)
L2RE1.41
25
Operator learningPDEBench CNS (η=0.1, ζ=0.01)
L2 Relative Error0.009
25
Operator learningPDEArena NS
L2RE3.13
25
Operator learningFNO-ν 1e-3
L2RE0.0031
25
Showing 10 of 29 rows

Other info

Follow for update