Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TrioXpert: An Automated Incident Management Framework for Microservice System

About

Automated incident management plays a pivotal role in large-scale microservice systems. However, many existing methods rely solely on single-modal data (e.g., metrics, logs, and traces) and struggle to simultaneously address multiple downstream tasks, including anomaly detection (AD), failure triage (FT), and root cause localization (RCL). Moreover, the lack of clear reasoning evidence in current techniques often leads to insufficient interpretability. To address these limitations, we propose TrioXpert, an end-to-end incident management framework capable of fully leveraging multimodal data. TrioXpert designs three independent data processing pipelines based on the inherent characteristics of different modalities, comprehensively characterizing the operational status of microservice systems from both numerical and textual dimensions. It employs a collaborative reasoning mechanism using large language models (LLMs) to simultaneously handle multiple tasks while providing clear reasoning evidence to ensure strong interpretability. We conducted extensive evaluations on two microservice system datasets, and the experimental results demonstrate that TrioXpert achieves outstanding performance in AD (improving by 4.7% to 57.7%), FT (improving by 2.1% to 40.6%), and RCL (improving by 1.6% to 163.1%) tasks. TrioXpert has also been deployed in Lenovo's production environment, demonstrating substantial gains in diagnostic efficiency and accuracy.

Yongqian Sun, Yu Luo, Xidao Wen, Yuan Yuan, Xiaohui Nie, Shenglin Zhang, Tong Liu, Xi Luo• 2025

Related benchmarks

TaskDatasetResultRank
Root Cause LocalizationD1 complete data conditions
Top-1 Score65.1
7
Root Cause LocalizationD2 complete data conditions
Top-1 Accuracy55
7
Failure TriageD1 complete data conditions
Precision85.2
6
Failure TriageD2 complete data conditions
Precision81.4
6
Anomaly DetectionD1 complete data conditions
Precision88
6
Anomaly DetectionD2 complete data conditions
Precision85.4
6
Showing 6 of 6 rows

Other info

Follow for update