Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers

About

Neural PDE solvers have followed the scaling trajectory of vision and language, with recent foundation models reaching billions of parameters. We argue that scale is a poor substitute for architectural inductive bias in this domain: structured priors deliver outsized parameter efficiency, and the pattern of where they succeed and fail is itself informative about what they capture. We instantiate this argument in WaveLiT, an architecture combining a discrete wavelet transform for lossless multi-resolution tokenization, an augmented linear attention block, a shared-weight multiscale feature pyramid, and a wavelet-domain auxiliary loss. Bespoke 1-10M-parameter WaveLiT models compete with foundation models of 100-1000$\times$ their size across eight TheWell benchmarks, with the largest gains on wave and acoustic-dominated benchmarks where the wavelet-multiscale prior fits the dominant dynamical structure and small per-step errors do not compound geometrically under rollout. Trained jointly across all eight benchmarks, a 10M-parameter foundation variant exhibits a structured, physically interpretable transfer pattern -- strongest where the wavelet-multiscale prior matches the dynamics, weakest on chaotic advection-dominated flows. The entire pipeline trains on a single GPU. The results suggest that small-model PDE performance is shaped by architectural inductive bias rather than scale, and that the structure of a prior's failures is a useful empirical signal about its content.

Shyam Sankaran, Hanwen Wang, Paris Perdikaris• 2026

Related benchmarks

Task	Dataset	Result
Autoregressive rollout prediction	TheWell-ASM steps 21-60 T ∈ [21: 60] (test)	VRMSE0.0268	20
One-step prediction	ASM* (TheWell) 1.0 (test)	VRMSE0.0016	12
One-step prediction	HS (TheWell) 1.0 (test)	VRMSE3.00e-4	12
One-step prediction	TheWell RB 1.0 (test)	VRMSE0.0065	12
One-step prediction	TheWell SF 1.0 (test)	VRMSE0.0015	12
One-step prediction	TheWell TRL2D 1.0 (test)	VRMSE0.1167	12
One-step prediction	AM (TheWell) 1.0 (test)	VRMSE0.0114	12
One-step prediction	VI (TheWell) 1.0 (test)	VRMSE0.0301	12
One-step prediction	TheWell GS 1.0 (test)	VRMSE6.00e-4	11
Autoregressive rollout prediction	TheWell-HS steps T ∈ [21: 46] (test)	VRMSE0.002	8

Showing 10 of 40 rows

Other info

Follow for update

@wizwand_team Discord