PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

About

This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.

David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro, Tom Ravaud, Yohann Perron, Corentin Sautier, Zeynep Sonat Baltaci, Fei Meng, Syrine Kalleli, Marta L\'opez-Rauhut, Thibaut Loiseau, S\'egol\`ene Albouy, Raphael Baena, Elliot Vincent, Loic Landrieu• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	--	1896
Class-conditional Image Generation	ImageNet 256x256 (train)	--	367
Multitask Language Understanding	MMLU	Accuracy25.6	263
Language Modeling	FineWeb (val)	Validation Loss3.31	217
Commonsense Reasoning	WinoGrande	Accuracy51.9	103
Question Answering	ARC-E	Normalized Accuracy (ARC-E)29	59
3D Semantic Segmentation	ScanNet	mIoU76.8	27
3D Point Cloud Segmentation	SemanticKITTI	mIoU67.5	3
Optical Character Recognition	Ludovico Antonio Muratori (LAM) single-line	CER2.8	3
Optical Character Recognition	Ludovico Antonio Muratori (LAM) (multi-line)	CER3.3	3

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord