Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

About

This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.

David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro, Tom Ravaud, Yohann Perron, Corentin Sautier, Zeynep Sonat Baltaci, Fei Meng, Syrine Kalleli, Marta L\'opez-Rauhut, Thibaut Loiseau, S\'egol\`ene Albouy, Raphael Baena, Elliot Vincent, Loic Landrieu• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag--
1891
Class-conditional Image GenerationImageNet 256x256 (train)--
345
Language ModelingFineWeb (val)
Validation Loss3.31
159
Commonsense ReasoningWinoGrande
Accuracy51.9
78
Multitask Language UnderstandingMMLU
Accuracy25.6
34
3D Semantic SegmentationScanNet
mIoU76.8
27
Question AnsweringARC-E
Normalized Accuracy (ARC-E)29
19
3D Point Cloud SegmentationSemanticKITTI
mIoU67.5
3
Optical Character RecognitionLudovico Antonio Muratori (LAM) single-line
CER2.8
3
Optical Character RecognitionLudovico Antonio Muratori (LAM) (multi-line)
CER3.3
3
Showing 10 of 11 rows

Other info

Follow for update