Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

About

Modern vision backbones treat pointwise activations (e.g., ReLU, GELU) and exponential softmax as essential sources of nonlinearity, but we demonstrate they are not required within MetaFormer-style vision backbones. We design activation-free polynomial alternatives for three core primitives (MLPs, convolutions, and attention), where Hadamard products replace standard nonlinearities to yield polynomial functions of the input. These modules integrate seamlessly into existing architectures: instantiated within MetaFormer, a modular framework for vision backbones, our PolyNeXt models match or exceed activation-based counterparts across model scales on ImageNet classification, ADE20K semantic segmentation, and out-of-distribution robustness. We also substantially outperform prior polynomial networks at reduced computational cost, showing that polynomial variants of standard modules beat complex custom architectures.

Jeffrey Wang, Jonathan Gregory, Grigorios G. Chrysos• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10	--	973
Image Classification	ImageNet-Sketch	Top-1 Accuracy41.8	491
Image Classification	SVHN	Top-1 Accuracy98.1	209
Semantic segmentation	ADE20K	mIoU50.6	90
Robustness	ImageNet-C	mCE42.5	30
Robustness Image Classification	ImageNet A	Top-1 Accuracy49.2	18
Robustness Image Classification	ImageNet-R	Top-1 Accuracy54.5	18

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord