EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

About

Transformer inference in machine-learning-as-a-service (MLaaS) raises privacy concerns for sensitive user inputs. Prior secure solutions that combine fully homomorphic encryption (FHE) and secure multiparty computation (MPC) are bottlenecked by inefficient FHE kernels, communication-heavy MPC protocols, and expensive FHE-MPC conversions. We present EncFormer, a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. EncFormer also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities. With GPU optimizations, evaluations on GPT- and BERT-style models show that EncFormer achieves 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems, and 1.9x-3.5x lower end-to-end latency on BERT-base than FHE-only pipelines under a matched backend, while maintaining near-plaintext accuracy on selected GLUE tasks.

Yufan Zhu, Chao Jin, Khin Mi Mi Aung, Xiaokui Xiao• 2026

Related benchmarks

Task	Dataset	Result
Inference Latency	BERT base	Attention Layer Latency (s)40.54	6
Natural Language Understanding	GLUE (val)	SST-2 Accuracy91.78	6
Secure Transformer Inference	BERT base	Online Overhead (GB)2.2	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord