SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene Classification

About

Remote sensing scene classification plays a key role in Earth observation by enabling the automatic identification of land use and land cover (LULC) patterns from aerial and satellite imagery. Despite recent progress with convolutional neural networks (CNNs) and vision transformers (ViTs), the task remains challenging due to variations in spatial resolution, viewpoint, orientation, and background conditions, which often reduce the generalization ability of existing models. To address these challenges, this paper proposes a lightweight architecture based on the convolutional mixer paradigm. The model alternates between spatial mixing through depthwise convolutions at multiple scales and channel mixing through pointwise operations, enabling efficient extraction of both local and contextual information while keeping the number of parameters and computations low. Extensive experiments were conducted on the AID and EuroSAT benchmarks. The proposed model achieved overall accuracy, average accuracy, and Kappa values of 74.7%, 74.57%, and 73.79 on the AID dataset, and 93.90%, 93.93%, and 93.22 on EuroSAT, respectively. These results demonstrate that the proposed approach provides a good balance between accuracy and efficiency compared with widely used CNN- and transformer-based models. Code will be publicly available on: https://github.com/mqalkhatib/SceneMixer

Mohammed Q. Alkhatib, Ali Jamali, Swalpa Kumar Roy• 2025

Related benchmarks

Task	Dataset	Result
Remote Sensing Scene Classification	EuroSAT	--	15
Scene Classification	AID	OA74.7	7
Remote Sensing Scene Classification	AID (test)	Overall Accuracy74.7	1
Remote Sensing Scene Classification	EuroSAT (test)	Overall Accuracy93.9	1

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord