Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OpenFE: Automated Feature Generation with Expert-level Performance

About

The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting. The code is available at https://github.com/ZhangTP1996/OpenFE.

Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, Jian Li• 2022

Related benchmarks

TaskDatasetResultRank
ClassificationElectricity
Mean Test Error Rate0.0793
27
RegressionHousing
RMSE0.228
26
ClassificationGerman Credit UCIrvine
Macro F174.5
25
RegressionAirfoil UCIrvine
1-RAE0.5746
24
RegressionOpenml_586
1-RAE0.6311
24
ClassificationSVMGuide3 LibSVM (5-fold cross-val)
Macro F183.05
17
ClassificationAmazon Employee Kaggle (5-fold cross-validation)
Macro F193.44
17
ClassificationGerman Credit UCIrvine (5-fold cross-val)
Macro F10.745
17
ClassificationPimaIndian Kaggle (5-fold cross-validation)
Macro F1 Score80.86
17
ClassificationIonosphere UCIrvine (5-fold cross-validation)
Macro F1 Score93.37
17
Showing 10 of 56 rows

Other info

Follow for update