The autofeat Python Library for Automated Feature Engineering and Selection
About
This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such as neural networks, are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. Our library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Adult | Accuracy81.4 | 86 | |
| Classification | vehicle | Accuracy78.8 | 65 | |
| Classification | Credit | ROCAUC67.6 | 63 | |
| Classification | Heart | Accuracy85.7 | 59 | |
| Classification | Bank | -- | 48 | |
| Classification | CAR | Accuracy99.8 | 47 | |
| Multiclass Classification | CMC | Accuracy50.5 | 41 | |
| Classification | Churn | AUROC0.829 | 33 | |
| Classification | Balance Scale | Accuracy92.5 | 29 | |
| Classification | BreastW | Accuracy95.6 | 29 |