Lip-Siri: Contactless Open-Sentence Silent Speech with Wi-Fi Backscatter

About

Silent speech interfaces (SSIs) enable silent interaction in noise-sensitive or privacy-sensitive settings. However, existing SSIs face practical deployment trade-offs among privacy, user experience, and energy consumption, and most remain limited to closed-set recognition over small, pre-defined vocabularies of words or sentences, which restricts real-world expressiveness. In this paper, we present Lip-Siri, to the best of our knowledge, the first Wi-Fi backscatter--based SSI that supports open-vocabulary sentence recognition via lexicon-guided subword decoding. Lip-Siri designs a frequency-shifted backscatter tag to isolate tag-modulated reflections and suppress interference from non-target motions, enabling reliable extraction of lip-motion traces from ubiquitous Wi-Fi signals. We then segment continuous traces into lip-motion units, cluster them, learn robust unit representations via cluster-based self-supervision, and finally propose a lexicon-guided Transformer encoder--decoder with beam search to decode variable-length sentence sequences. We implement an end-to-end prototype and evaluate it with 15 participants on 340 sentences and 3,398 words across multiple scenarios. Lip-Siri achieves 85.61% accuracy on word prediction and a WER of 36.87% on continuous sentence recognition, approaching the performance of representative vision-based lip-reading systems.

Ye Tian, Haohua Du, Chao Gu, Junyang Zhang, Shanyue Wang, Hao Zhou, Jiahui Hou, Xiang-Yang Li• 2026

Related benchmarks

Task	Dataset	Result	Rank
Silent Speech Recognition	SOTA sensing-based SSI datasets (various)	Accuracy85.61		9

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord