The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

About

Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the Phasor Transformer block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global $\mathcal{O}(N\log N)$ mixing without explicit attention maps. Stacking these blocks defines the Large Phasor Model (LPM). We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks against honest baselines: it beats a zero-parameter persistence baseline and, with the corrected gradient path, improves monotonically with depth before saturating, while remaining competitive-but-not-superior to self-attention at a fraction of the parameter count. Our results establish an explicit efficiency--accuracy frontier, showing that scalable temporal modeling in oscillatory domains can emerge from geometry-constrained phase computation with deterministic global coupling.

Dibakar Sigdel• 2026

Related benchmarks

Task	Dataset	Result	Rank
Sequence Global Correlation Prediction	Sequence Global Correlation N=32 (test)	MAE0.1817		2
Sequence regression	Synthetic autoregressive multi-frequency sequences N=32 (test)	Test MSE0.07		2

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord