Large-Scale Secure XGB for Vertical Federated Learning

About

Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (i) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (ii) we store the output model in a distributed manner in order to minimize information release, and (iii) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.

Wenjing Fang, Derun Zhao, Jin Tan, Chaochao Chen, Chaofan Yu, Li Wang, Lei Wang, Jun Zhou, Benyu Zhang• 2020

Related benchmarks

Task	Dataset	Result
Classification	SKINNONSKIN	F1-Score74.1	7
Binary Classification	cod-rna	F1 Score40.3	5
Binary Classification	Breast cancer	F1 Score88.9	5
Binary Classification	Phishing	F1 Score95.1	5
Binary Classification	a9a	F1 Score64.3	5
Binary Classification	covtype binary	F1 Score55.2	5
Private GBDT Training	Synthetic LAN (n=1.4x10^5, D=5, B=10, m0=7, m1=16)	Training Time (s)476	3
Private GBDT Training	Synthetic (n=1.4x10^5, D=5, B=10, m0=7, m1=16) - 100Mbps WAN	Training Time (s)1.51e+3	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord