In-context Learning of Evolving Data Streams with Tabular Foundational Models
About
State-of-the-art data stream mining has long drawn from ensembles of the Very Fast Decision Tree, a seminal algorithm honored with the 2015 KDD Test-of-Time Award. However, the emergence of large tabular models, i.e., transformers designed for structured numerical data, marks a significant paradigm shift. These models move beyond traditional weight updates, instead employing in-context learning through prompt tuning. By using on-the-fly sketches to summarize unbounded streaming data, one can feed this information into a pre-trained model for efficient processing. This work bridges advancements from both areas, highlighting how transformers' implicit meta-learning abilities, pre-training on drifting natural data, and reliance on context optimization directly address the core challenges of adaptive learning in dynamic environments. Exploring real-time model adaptation, this research demonstrates that TabPFN, coupled with a simple sliding memory strategy, consistently outperforms ensembles of Hoeffding trees, such as Adaptive Random Forest, and Streaming Random Patches, across all non-stationary benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Data Stream Classification | CaDrift Dataset 4 | Accuracy86 | 7 | |
| Data Stream Classification | CaDrift Dataset 5 | Accuracy96.85 | 7 | |
| Data Stream Classification | CaDrift Dataset 6 | Accuracy80.26 | 7 | |
| Data Stream Classification | Sea | Accuracy97.42 | 7 | |
| Data Stream Classification | RandomRBF | Accuracy65.77 | 7 | |
| Data Stream Classification | CaDrift Dataset 7 | Accuracy35.93 | 7 | |
| Data Stream Classification | CaDrift Dataset 8 | Accuracy78.94 | 7 | |
| Data Stream Classification | CaDrift Dataset 1 | Accuracy68.83 | 7 | |
| Data Stream Classification | CaDrift Dataset 2 | Accuracy67.18 | 7 | |
| Data Stream Classification | Sine | Accuracy85.77 | 7 |