NeXt2Former-CD: Efficient Remote Sensing Change Detection with Modern Vision Architectures

About

State Space Models (SSMs) have recently gained traction in remote sensing change detection (CD) for their favorable scaling properties. In this paper, we explore the potential of modern convolutional and attention-based architectures as a competitive alternative. We propose NeXt2Former-CD, an end-to-end framework that integrates a Siamese ConvNeXt encoder initialized with DINOv3 weights, a deformable attention-based temporal fusion module, and a Mask2Former decoder. This design is intended to better tolerate residual co-registration noise and small object-level spatial shifts, as well as semantic ambiguity in bi-temporal imagery. Experiments on LEVIR-CD, WHU-CD, and CDD datasets show that our method achieves the best results among the evaluated methods, improving over recent Mamba-based baselines in both F1 score and IoU. Furthermore, despite a larger parameter count, our model maintains inference latency comparable to SSM-based approaches, suggesting it is practical for high-resolution change detection tasks.

Yufan Wang, Sokratis Makrogiannis, Chandra Kambhamettu• 2026

Related benchmarks

Task	Dataset	Result
Change Detection	LEVIR-CD (test)	F1 Score92.1	485
Change Detection	WHU-CD (test)	IoU91.4	380
Change Detection	CDD (test)	F1 Score98.4	88

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord