RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings

About

We present a new class of efficient attention mechanisms applying universal 3D Relative Positional Encoding (RPE) methods given by arbitrary integrable modulation functions $f$. They lead to the new class of 3D-Transformer models, called \textit{RelFlexformers}, flexibly integrating those RPEs, and characterized by the $O(L \log L)$ time complexity of the attention computation for the $L$-length input sequences. RelFlexformers builds on the theory of the Non-Uniform Fourier Transform (NU-FFT), naturally generalizing several existing efficient RPE-attention methods from structured settings with tokens homogeneously embedded in unweighted grids into general non-structured heterogeneous scenarios, where tokens' positions are arbitrarily distributed in the corresponding 3D spaces. As such, RelFlexformers can be applied in particular to model point clouds. Our extensive empirical evaluation on a large portfolio of 3D datasets confirms quality improvements provided by the NU-FFT-driven attention modulation techniques in the RelFlexformers.

Byeongchan Kim, Arijit Sehanobish, Avinava Dubey, Min-hwan Oh, Krzysztof Choromanski• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU72.1	1029
Semantic segmentation	ScanNet V2 (val)	mIoU76.9	380
Semantic segmentation	nuScenes (val)	mIoU (Segmentation)0.812	323
Semantic segmentation	SUN RGB-D	mIoU51	85
Object Classification	ModelNet40	Overall Accuracy92.9	78
Semantic segmentation	NYU Depth V2	mIoU55.3	56
Classification	ScanObjectNN v2	Overall Accuracy (OA)84.5	17

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord