XGBoost: A Scalable Tree Boosting System
About
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Tianqi Chen, Carlos Guestrin• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (test) | Accuracy45 | 3381 | |
| Image Classification | MNIST (test) | Accuracy97.5 | 882 | |
| Image Classification | MNIST | Accuracy98.02 | 395 | |
| Click-Through Rate Prediction | Avazu (test) | AUC0.7753 | 191 | |
| CTR Prediction | Criteo (test) | AUC0.7862 | 141 | |
| Regression | Dst index (test) | RMSE16.1 | 126 | |
| Tabular Classification | 75 Tabular Classification Datasets (test) | Accuracy62.98 | 89 | |
| Tabular Regression | 52 Tabular Datasets (test) | NMAE0.16 | 85 | |
| Classification | CUB (test) | Accuracy71.86 | 79 | |
| Feature Selection | Simulated Data | ROC AUC59.3 | 70 |
Showing 10 of 624 rows
...