Choose a Transformer: Fourier or Galerkin

About

In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need for the first time to a data-driven operator learning problem related to partial differential equations. An effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. By employing the operator approximation theory in Hilbert spaces, it is demonstrated for the first time that the softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without softmax, the approximation capacity of a linearized Transformer variant can be proved to be comparable to a Petrov-Galerkin projection layer-wise, and the estimate is independent with respect to the sequence length. A new layer normalization scheme mimicking the Petrov-Galerkin projection is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts.

Shuhao Cao• 2021

Related benchmarks

Task	Dataset	Result
PDE solving	Darcy	Relative L2 Error0.0084	46
Forward PDE solving	Elasticity	Relative L2 Error0.024	44
PDE solving	Navier-Stokes Regular Grid (test)	Relative L2 Error0.1401	41
PDE solving	Darcy Regular Grid (test)	Relative L2 Error0.0084	41
PDE solving	Airfoil Structured Mesh (test)	Relative L2 Error0.0118	38
PDE solving	Pipe Structured Mesh (test)	Relative L2 Error0.0098	38
Forward PDE solving	Plasticity	Relative L2 Error0.012	36
Forward PDE solving	Airfoil	Relative L21.18	36
Forward PDE solving	Pipe	Relative L2 Error0.0098	35
Fluid Dynamics Simulation	Navier-Stokes (NS) nu=10^-5 at 64x64 unified-protocol (test)	Relative L2 Error (Test)10.97	31

Showing 10 of 99 rows

...

Other info

Code

Follow for update

@wizwand_team Discord