Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Choose a Transformer: Fourier or Galerkin

About

In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need for the first time to a data-driven operator learning problem related to partial differential equations. An effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. By employing the operator approximation theory in Hilbert spaces, it is demonstrated for the first time that the softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without softmax, the approximation capacity of a linearized Transformer variant can be proved to be comparable to a Petrov-Galerkin projection layer-wise, and the estimate is independent with respect to the sequence length. A new layer normalization scheme mimicking the Petrov-Galerkin projection is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts.

Shuhao Cao• 2021

Related benchmarks

TaskDatasetResultRank
Inverse coefficient identificationInverse Problem 5.3 nf, nc = 141, 36
Relative Error0.018
30
Inverse coefficient identificationInverse Problem 5.3 nf, nc = 211, 71
Relative Error (x10^-2)0.0152
30
Forward PDE solvingAirfoil
Relative L21.18
21
Forward PDE solvingPlasticity
Relative L2 Error0.012
21
Forward PDE solvingPipe
Relative L2 Error0.0098
20
Forward PDE solvingElasticity
Relative L2 Error0.024
19
PDE solvingNavier-Stokes Regular Grid (test)
Relative L2 Error0.1401
16
PDE solvingDarcy Regular Grid (test)
Relative L2 Error0.0084
16
Operator learningNavier-Stokes Regular Grid (test)
Relative L2 Error0.1401
15
CFD field reconstructionShapeNet Car (test)
Volume Error3.39
15
Showing 10 of 45 rows

Other info

Code

Follow for update