Bifrost: A Much Simpler Secure Two-Party Data Join Protocol for Secure Data Analytics
About
Secure data join enables two parties with vertically distributed data to securely compute the joined table, allowing the parties to perform downstream Secure multi-party computation-based Data Analytics (SDA), such as training machine learning models, based on the joined table. While Circuit-based Private Set Intersection (CPSI) can be used for secure data join, it introduces redundant dummy rows in the joined table, which results in high overhead in the downstream SDA tasks. iPrivJoin addresses this issue but introduces significant communication overhead in the redundancy removal process, as it relies on the cryptographic primitive OPPRF for data encoding and multiple rounds of oblivious shuffles. In this paper, we propose a much simpler secure data join protocol, Bifrost, which outputs (the secret shares of) a redundancy-free joined table. The highlight of Bifrost lies in its simplicity: it builds upon two conceptually simple building blocks, an ECDH-PSI protocol and a two-party oblivious shuffle protocol. The lightweight protocol design allows Bifrost to avoid the need for OPPRF. We also proposed a simple optimization named \textit{dual mapping} that reduces the rounds of oblivious shuffle needed from two to one. Experiments on datasets of up to 100 GB show that Bifrost achieves $2.54 \sim 22.32\times$ speedup and reduces the communication by $84.15\% \sim 88.97\%$ compared to the SOTA redundancy-free secure data join protocol iPrivJoin. Notably, the communication size of Bifrost is nearly equal to the size of the input data. In the two-step SDA pipeline evaluation (secure join and SDA), the redundancy-free property of Bifrost not only avoids the catastrophic error rate blowup in the downstream tasks caused by the dummy rows in the joined table (as introduced in CPSI), but also shows up to $2.80\times$ speed-up in the SDA process with up to $73.15\%$ communication reduction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Secure Pearson Correlation | Education Career Success | Latency (s)0.9 | 6 | |
| Secure Pearson Correlation | Breast Cancer Gene | Time (s)0.36 | 6 | |
| Secure Pearson Correlation | Skin Cancer | Time (s)1.93 | 6 | |
| Secure Pearson Correlation | a9a | Latency (s)5.91 | 6 | |
| Secure Pearson Correlation | ARCENE | Latency (s)0.27 | 6 | |
| Secure Pearson Correlation | MNIST | Time (s)11.55 | 6 | |
| Secure Pearson Correlation | Tiny-ImageNet | Time (s)24.52 | 6 | |
| Secure Pearson Correlation | CelebA | Time (s)52.82 | 6 | |
| Secure Statistical Analysis | Education Career Success | Latency (s)4.21 | 3 | |
| Secure Statistical Analysis | Breast Cancer Gene | Latency (s)12.65 | 3 |