Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data

About

Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.

Sergei Popov, Stanislav Morozov, Artem Babenko• 2019

Related benchmarks

TaskDatasetResultRank
ClassificationCredit
ROCAUC98.1
50
ClassificationGE
Accuracy53.9
37
RegressionHousing
RMSE0.523
26
RegressionYear
MSE76.21
25
Binary ClassificationMIMIC 2
AUC0.843
25
Tabular ClassificationTabZilla avg across 98 datasets
Mean Accuracy83.8
20
Binary ClassificationIncome
AUC0.919
19
Tabular ClassificationNUM (L) (test)
Macro F10.956
18
Classificationkc1
Balanced Accuracy55.803
18
Binary ClassificationHIGGS small (test)
Accuracy (%)72.6
15
Showing 10 of 54 rows

Other info

Follow for update