TAB: Unified Benchmarking of Time Series Anomaly Detection Methods
About
Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of reliable means of evaluating new methods and comparing them with existing methods. We address deficiencies in current evaluation procedures related to datasets and experimental settings and protocols. Specifically, we propose a new time series anomaly detection benchmark, called TAB. First, TAB encompasses 29 public multivariate datasets and 1,635 univariate time series from different domains to facilitate more comprehensive evaluations on diverse datasets. Second, TAB covers a variety of TSAD methods, including Non-learning, Machine learning, Deep learning, LLM-based, and Time-series pre-trained methods. Third, TAB features a unified and automated evaluation pipeline that enables fair and easy evaluation of TSAD methods. Finally, we employ TAB to evaluate existing TSAD methods and report on the outcomes, thereby offering a deeper insight into the performance of these methods. Besides, all datasets and code are available at https://github.com/decisionintelligence/TAB.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multivariate Time Series Anomaly Detection | SMAP | Precision16.41 | 19 | |
| Multivariate Time Series Anomaly Detection | SWaT | Precision9.26 | 19 | |
| Multivariate Time Series Anomaly Detection | WADI | Precision0.1659 | 19 | |
| Multivariate Time Series Anomaly Detection | PSM (Pooled Server Metrics) | ROC AUC71.31 | 8 |