Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Almost-Free Queue Jumping for Prior Inputs in Private Neural Inference

About

Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. Na\"ive queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.

Qiao Zhang, Minghui Xu, Tingchuang Zhang, Xiuzhen Cheng• 2026

Related benchmarks

TaskDatasetResultRank
Private Inference EfficiencyIn-queue inputs WAN2 Online phase
Latency (s)4.1
48
Private Inference EfficiencyIn-queue inputs LAN Online phase
Communication Overhead (MiB)31.7
48
Online overhead computationLAN
Communication (MiB)0.19
32
Online overhead computationWAN 1
Latency (s)0.049
32
Online overhead computationWAN2
Latency (s)0.049
32
Online overhead computationWAN3
Time (s)0.049
32
Online overhead computationWAN4
Execution Time (s)0.05
32
Private Inference EfficiencyIn-queue inputs WAN3 Online phase
Inference Time (s)1.9
32
Private Inference EfficiencyIn-queue inputs WAN4 Online phase
Time (s)2.6
32
Private Inference EfficiencyIn-queue inputs WAN1 Online phase
Time (s)3.3
32
Showing 10 of 26 rows

Other info

Follow for update