RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems with Telemetry Data
About
Root cause analysis (RCA) for microservice systems has gained significant attention in recent years. However, there is still no standard benchmark that includes large-scale datasets and supports comprehensive evaluation environments. In this paper, we introduce RCAEval, an open-source benchmark that provides datasets and an evaluation environment for RCA in microservice systems. First, we introduce three comprehensive datasets comprising 735 failure cases collected from three microservice systems, covering various fault types observed in real-world failures. Second, we present a comprehensive evaluation framework that includes fifteen reproducible baselines covering a wide range of RCA approaches, with the ability to evaluate both coarse-grained and fine-grained RCA. We hope that this ready-to-use benchmark will enable researchers and practitioners to conduct extensive analysis and pave the way for robust new solutions for RCA of microservice systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Root Cause Analysis | RE3TT Train Ticket with code-level faults | F1@10.00e+0 | 9 | |
| Root Cause Analysis | RCAEval Overall All nine datasets (RE1OB-RE3TT) 1.0 | Top-1 Accuracy9 | 9 | |
| Root Cause Analysis | RE2TT (Train Ticket with multimodal data) | CPU Top-10.00e+0 | 9 | |
| Root Cause Analysis | RE3OB Online Boutique with code-level faults | F1 Top-1 Accuracy11 | 9 | |
| Root Cause Analysis | RE3SS Sock Shop with code-level faults | F1 Top-10.2 | 8 | |
| Root Cause Analysis | RE1SS (Sock Shop) unimodal data | CPU Top-120 | 8 | |
| Root Cause Analysis | RE1TT Train Ticket unimodal data | CPU Top-10.00e+0 | 8 | |
| Root Cause Analysis | RE2SS Sock Shop with multimodal data (test) | CPU Top-1 Accuracy20 | 8 | |
| Root Cause Analysis | RE1OB (Online Boutique) RCAEval benchmark unimodal data | CPU Top-1 Acc0.00e+0 | 8 |