Learning Interpretable Rules for Scalable Data Representation and Classification
About
Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Bank | F1 Score77.18 | 48 | |
| Classification | dota2 | F1 Score60.08 | 26 | |
| Classification | Overall 13 datasets aggregate | N-Mean85.1 | 26 | |
| Classification | FB | F1 Score90.11 | 26 | |
| Classification | banknote | F1 Score100 | 26 | |
| Classification | Wine | F1 Score98.37 | 26 | |
| Classification | c-4 | F1 Score72.01 | 26 | |
| Classification | magic | F1 Score86.24 | 26 | |
| Classification | tic-tac-toe | F1 Score100 | 26 | |
| Classification | Fashion | F1 Score89.64 | 26 |