Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robust Failure Diagnosis of Microservice System through Multimodal Data

About

Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these sources of data (multimodal data) leads to a more accurate diagnosis. However, effectively representing these data and addressing imbalanced failures remain challenging. To tackle these issues, we propose DiagFusion, a robust failure diagnosis approach that uses multimodal data. It leverages embedding techniques and data augmentation to represent the multimodal data of service instances, combines deployment data and traces to build a dependency graph, and uses a graph neural network to localize the root cause instance and determine the failure type. Our evaluations using real-world datasets show that DiagFusion outperforms existing methods in terms of root cause instance localization (improving by 20.9% to 368%) and failure type determination (improving by 11.0% to 169%).

Shenglin Zhang, Pengxiang Jin, Zihan Lin, Yongqian Sun, Bicheng Zhang, Sibo Xia, Zhengdan Li, Zhenyu Zhong, Minghua Ma, Wa Jin, Dai Zhang, Zhenyu Zhu, Dan Pei• 2023

Related benchmarks

TaskDatasetResultRank
Root Cause LocalizationD2 complete data conditions
Top-1 Accuracy58.2
7
Root Cause LocalizationD1 complete data conditions
Top-1 Score31
7
Failure TriageD1 complete data conditions
Precision67.5
6
Failure TriageD2 complete data conditions
Precision79.7
6
Showing 4 of 4 rows

Other info

Follow for update