Privacy Preserving Vertical Federated Learning for Tree-based Models

About

Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {\it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise $m-1$ out of $m$ clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, Beng Chin Ooi• 2020

Related benchmarks

Task	Dataset	Result
Classification	SKINNONSKIN	F1-Score74.3	7
Binary Classification	Breast cancer	F1 Score91.9	5
Binary Classification	a9a	F1 Score65.3	5
Binary Classification	cod-rna	F1 Score40.8	5
Binary Classification	covtype binary	F1 Score57.2	5
Binary Classification	Phishing	F1 Score95.7	5
Private GBDT Training	Synthetic LAN (n=5x10^4, D=4, B=8, m0=8, m1=7)	Training Time (s)1.68e+3	3
Private GBDT Training	Synthetic LAN (n=2x10^5, D=4, B=8, m0=8, m1=7)	Training Time (s)448	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord