Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information
About
Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Drift Detection and Model Selection | RTREE-U | κ Statistic0.83 | 10 | |
| Drift Detection and Model Selection | CMC | Kappa (κ)0.3 | 10 | |
| Drift Detection and Model Selection | UCI Wine | Kappa0.26 | 10 | |
| Drift Detection and Model Selection | HPLANE-U | Kappa Statistic0.44 | 10 | |
| Drift Detection and Model Selection | STAGGER | Kappa0.98 | 10 | |
| Concept Drift Detection | AQSex | Kappa0.95 | 6 | |
| Concept Drift Detection | RBF | Kappa0.81 | 6 | |
| Concept Drift Detection | Arabic | Kappa0.9 | 6 | |
| Concept Drift Detection | QG | Kappa Statistic0.84 | 6 | |
| Drift Detection and Model Selection | AQSex | Kappa Statistic0.94 | 4 |