Family Matters: A Systematic Study of Spatial vs. Frequency Masking for Continual Test-Time Adaptation

About

Recent continual test-time adaptation (CTTA) methods adopt masked image modeling to stabilize learning under distribution shift, yet each treats its masking family F as a fixed design choice and innovates exclusively along the selection strategy S, leaving the family axis underexplored. We present a systematic empirical study that isolates this axis. Using a controlled CTTA instantiation -- Mask to Adapt (M2A) -- that fixes S = random and standard losses, we vary only F across spatial (patch, pixel) and frequency (all-band, low-band, high-band) families while keeping every other component identical. The study's contributions are the design guidance it extracts for the CTTA settings we evaluated: (1) the masking family determines whether adaptation compounds useful structure or compounds errors -- on patch-tokenized architectures, spatial masking accumulates stable representations over long streams while frequency masking collapses catastrophically. We characterize this instability through a structural-preservation account, where spatial coherence maintains the broad-spectrum redundancy needed to avoid terminally overlapping with a corruption's spectral signature; (2) the optimal family depends on architecture-task alignment -- on CNNs, whose overlapping receptive fields dilute patch occlusion, the family gap vanishes, whereas on fine-grained tasks with global cues and large-capacity ViTs, frequency masking becomes competitive. In confounded system-level comparisons -- where baselines also differ in losses and auxiliary components -- M2A's random selection performs comparably to heuristic strategies, though we treat this observation as suggestive context rather than a controlled quantification of S's relative importance.

Chandler Timm C. Doloriel, Yunbei Zhang, Yeonguk Yu, Taki Hasan Rafi, Muhammad salman siddiqui, Tor Kristian Stevik, Fadi Al Machot, Kristian Hovde Liland, Habib Ullah• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR10-C (test)	Accuracy (Gaussian)17	65
Image Classification	ImageNet-C 1.0 (test)	Accuracy (Average)55.3	53
Image Classification	CIFAR100-C 1.0 (test)	Avg Acc34.4	30

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord