AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

About

Pre-training neural operators on diverse partial differential equation (PDE) datasets has emerged as a promising direction for building general-purpose surrogate models in scientific machine learning. However, the inherent complexity and structural diversity of PDE solution operators make multi-PDE pre-training fundamentally challenging. Existing methods mainly address this by increasing model capacity, while leaving the target solution operators unchanged. Inspired by classical numerical analysis, we instead propose to transform complex and diverse solution operators into simpler, better-aligned forms that are easier to model jointly. Since the optimal transformation varies across PDE types, it must be adaptive and input-dependent, allowing a single neural operator to approximate an entire family of operators. We instantiate this idea as AOT-POT (adaptive operator-transformation for pre-training operator transformer), which expands hidden representations into multiple parallel streams, adaptively aggregates and redistributes them before and after each sub-layer, and mixes streams through Sinkhorn-projected doubly stochastic matrices for stable training. These mechanisms together reshape diverse solution operators into a unified form that can be effectively modeled by a single architecture. Empirically, AOT-POT achieves state-of-the-art performance on 12 PDE benchmarks with only 3\% additional parameters, reducing relative L2 error by up to 77.6\% (40.9\% on average). Fine-tuning AOT-POT further reduces L2 error by up to 92\% on in-domain PDEs and 89\% on out-of-domain PDEs (unseen types during pre-training), demonstrating that adaptive operator transformation is an effective and complementary direction for advancing PDE foundation models beyond simply scaling model capacity.

Qitan Lv, Hong Wang, Zhongkai Hao, Wen Wu, Xuenan Xu, Bowen Zhou, Feng Wu, Chao Zhang• 2026

Related benchmarks

Task	Dataset	Result
Operator learning	PDEBench DR	L2RE0.0064	28
Operator learning	PDEBench SWE	L2 Relative Error (L2RE)7.20e-4	28
Operator learning	FNO-ν 1e-5	L2 Relative Error1.45	25
Operator learning	FNO-ν (1e-4)	L2RE0.0061	25
Operator learning	FNO-ν 1e-3	L2RE0.0018	25
Operator learning	PDEBench CNS (η=1, ζ=0.1)	L2RE0.0043	25
Operator learning	PDEBench CNS (η=1, ζ=0.01)	L2RE0.491	25
Operator learning	PDEBench CNS (η=0.1, ζ=0.1)	L2 Relative Error (L2RE)0.0079	25
Operator learning	PDEBench CNS (η=0.1, ζ=0.01)	L2 Relative Error0.0032	25
Operator learning	PDEArena NS	L2RE2.36	25

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord