Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Choose a Transformer: Fourier or Galerkin

About

In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need for the first time to a data-driven operator learning problem related to partial differential equations. An effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. By employing the operator approximation theory in Hilbert spaces, it is demonstrated for the first time that the softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without softmax, the approximation capacity of a linearized Transformer variant can be proved to be comparable to a Petrov-Galerkin projection layer-wise, and the estimate is independent with respect to the sequence length. A new layer normalization scheme mimicking the Petrov-Galerkin projection is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts.

Shuhao Cao• 2021

Related benchmarks

TaskDatasetResultRank
PDE solvingDarcy
Relative L2 Error0.0084
46
Forward PDE solvingElasticity
Relative L2 Error0.024
44
PDE solvingNavier-Stokes Regular Grid (test)
Relative L2 Error0.1401
41
PDE solvingDarcy Regular Grid (test)
Relative L2 Error0.0084
41
PDE solvingAirfoil Structured Mesh (test)
Relative L2 Error0.0118
38
PDE solvingPipe Structured Mesh (test)
Relative L2 Error0.0098
38
Forward PDE solvingPlasticity
Relative L2 Error0.012
36
Forward PDE solvingAirfoil
Relative L21.18
36
Forward PDE solvingPipe
Relative L2 Error0.0098
35
Fluid Dynamics SimulationNavier-Stokes (NS) nu=10^-5 at 64x64 unified-protocol (test)
Relative L2 Error (Test)10.97
31
Showing 10 of 99 rows
...

Other info

Code

Follow for update