Mask-aware inference with State-Space Models

About

Many real-world computer vision tasks, such as depth completion, must handle inputs with arbitrarily shaped regions of missing or invalid data. For Convolutional Neural Networks (CNNs), Partial Convolutions solved this by a mask-aware re-normalization conditioned only on valid pixels. Recently, State Space Models (SSMs) like Mamba have emerged, offering high performance with linear complexity. However, these architectures lack an inherent mechanism for handling such arbitrarily shaped invalid data at inference time. To bridge this gap, we introduce Partial Vision Mamba (PVM), a novel architectural component that ports the principles of partial operations to the Mamba backbone. We also define a series of rules to design architectures using PVM. We show the efficacy and generalizability of our approach in the tasks of depth completion, image inpainting, and classification with invalid data.

Ignasi Mas, Ramon Morros, Javier-Ruiz Hidalgo, Ivan Huerta• 2026

Related benchmarks

Task	Dataset	Result	Rank
Inpainting	FFHQ	LPIPS0.143		62
Image Classification	ImageNet 1k masked (50 samples per class) (test)	Top-5 Accuracy34.93		2

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord