Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning

About

Many machine learning classification tasks involve imbalanced datasets, which are often subject to over-sampling techniques aimed at improving model performance. However, these techniques are prone to generating unrealistic or infeasible samples. Furthermore, they often function as black boxes, lacking interpretability in their procedures. This opacity makes it difficult to track their effectiveness and provide necessary adjustments, and they may ultimately fail to yield significant performance improvements. To bridge this gap, we introduce the Decision Predicate Graphs for Data Augmentation (DPG-da), a framework that extracts interpretable decision predicates from trained models to capture domain rules and enforce them during sample generation. This design ensures that over-sampled data remain diverse, constraint-satisfying, and interpretable. In experiments on synthetic and real-world benchmark datasets, DPG-da consistently improves classification performance over traditional over-sampling methods, while guaranteeing logical validity and offering clear, interpretable explanations of the over-sampled data.

Matheus Camilo da Silva, Gabriel Gustavo Costanzo, Andrea de Lorenzo, Sylvio Barbon Junior• 2026

Related benchmarks

TaskDatasetResultRank
Imbalanced Classificationabalone 19
F1-Score62.7
25
Imbalanced ClassificationArrhythmia
F1 Score86
25
Imbalanced Classificationcoil 2000
F1-Score61.9
25
Imbalanced Classificationecoli
F1-Score84.2
25
Imbalanced ClassificationIsolet
F1-Score88.4
25
Imbalanced Classificationoil
F1-Score71.3
25
Imbalanced Classificationoptical_digits
F1 Score93.7
25
Imbalanced Classificationozone_level
F1-Score68.2
25
Imbalanced ClassificationScene
F1-Score65.7
25
Imbalanced Classificationspectrometer
F1-Score90
25
Showing 10 of 28 rows

Other info

Follow for update