FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation

About

Short-form video moderation increasingly needs learning pipelines that protect user privacy without paying the full bandwidth and latency cost of cloud-centralized inference. We present FedVideoMAE, an on-device federated framework for video violence detection that combines self-supervised VideoMAE representations, LoRA-based parameter-efficient adaptation, client-side DP-SGD, and server-side secure aggregation. By updating only 5.5M parameters (about 3.5% of a 156M backbone), FedVideoMAE reduces communication by 28.3x relative to full-model federated updates while keeping raw videos on device throughout training. On RWF-2000 with 40 clients, the method reaches 77.25% accuracy without privacy protection and 65~66% under strong differential privacy. We further show that this privacy gap is consistent with an effective-SNR analysis tailored to the small-data, parameter-efficient federated regime, which indicates roughly 8.5~12x DP-noise amplification in our setting. To situate these results more clearly, we also compare against archived full-model federated baselines and summarize auxiliary transfer behavior on RLVS and binary UCF-Crime. Taken together, these findings position FedVideoMAE as a practical operating point for privacy-preserving video moderation on edge devices. Our code can be found at: https://github.com/zyt-599/FedVideoMAE.

Ziyuan Tao, Chuanzhi Xu, Sandaru Jayawardana, Adnan Mahmood, Wei Bao, Kanchana Thilakarathna, Teng Joon Lim• 2025

Related benchmarks

Task	Dataset	Result	Rank
Violence Detection	RWF-2000	Accuracy77.25		17
Video Activity Recognition	UCF-101	--		12

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord