Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers

About

Neural PDE solvers have followed the scaling trajectory of vision and language, with recent foundation models reaching billions of parameters. We argue that scale is a poor substitute for architectural inductive bias in this domain: structured priors deliver outsized parameter efficiency, and the pattern of where they succeed and fail is itself informative about what they capture. We instantiate this argument in WaveLiT, an architecture combining a discrete wavelet transform for lossless multi-resolution tokenization, an augmented linear attention block, a shared-weight multiscale feature pyramid, and a wavelet-domain auxiliary loss. Bespoke 1-10M-parameter WaveLiT models compete with foundation models of 100-1000$\times$ their size across eight TheWell benchmarks, with the largest gains on wave and acoustic-dominated benchmarks where the wavelet-multiscale prior fits the dominant dynamical structure and small per-step errors do not compound geometrically under rollout. Trained jointly across all eight benchmarks, a 10M-parameter foundation variant exhibits a structured, physically interpretable transfer pattern -- strongest where the wavelet-multiscale prior matches the dynamics, weakest on chaotic advection-dominated flows. The entire pipeline trains on a single GPU. The results suggest that small-model PDE performance is shaped by architectural inductive bias rather than scale, and that the structure of a prior's failures is a useful empirical signal about its content.

Shyam Sankaran, Hanwen Wang, Paris Perdikaris• 2026

Related benchmarks

TaskDatasetResultRank
Autoregressive rollout predictionTheWell-ASM steps 21-60 T ∈ [21: 60] (test)
VRMSE0.0268
20
One-step predictionASM* (TheWell) 1.0 (test)
VRMSE0.0016
12
One-step predictionHS (TheWell) 1.0 (test)
VRMSE3.00e-4
12
One-step predictionTheWell RB 1.0 (test)
VRMSE0.0065
12
One-step predictionTheWell SF 1.0 (test)
VRMSE0.0015
12
One-step predictionTheWell TRL2D 1.0 (test)
VRMSE0.1167
12
One-step predictionAM (TheWell) 1.0 (test)
VRMSE0.0114
12
One-step predictionVI (TheWell) 1.0 (test)
VRMSE0.0301
12
One-step predictionTheWell GS 1.0 (test)
VRMSE6.00e-4
11
Autoregressive rollout predictionTheWell-HS steps T ∈ [21: 46] (test)
VRMSE0.002
8
Showing 10 of 40 rows

Other info

Follow for update