Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
About
Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Penguins | Improvement in Balanced Accuracy7.8 | 8 | |
| Classification | xd6 | Improvement in BA0.003 | 8 | |
| Classification | prnn_crabs | Balanced Accuracy Improvement0.055 | 8 | |
| Data Synthesis | mofn 3_7_10 | MMD0.201 | 8 | |
| Data Synthesis | xd6 | MMD0.18 | 8 | |
| Classification | irish | Improvement in Balanced Acc-0.017 | 8 | |
| Data Synthesis | backache | MMD0.133 | 8 | |
| Data Synthesis | parity5+5 | MMD0.135 | 8 | |
| Data Synthesis | germangss | MMD0.083 | 8 | |
| Classification | mofn-3-7-10 | Improvement in Balanced Accuracy-0.018 | 8 |