TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition
About
Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Image Classification | CUB200 2011 (test) | Accuracy61.8 | 567 | |
| Material Classification | FMD (test) | Mean Accuracy43.5 | 52 | |
| Texture Classification | DTD (test) | Accuracy45.8 | 19 | |
| Fine-grained Image Classification | Flowers-102 (test) | Mean Accuracy58.5 | 8 |