Beyond a Single Perspective: Text Anomaly Detection with Multi-View Language Representations

About

Text anomaly detection (TAD) plays a critical role in various language-driven real-world applications, including harmful content moderation, phishing detection, and spam review filtering. While two-step "embedding-detector" TAD methods have shown state-of-the-art performance, their effectiveness is often limited by the use of a single embedding model and the lack of adaptability across diverse datasets and anomaly types. To address these limitations, we propose to exploit the embeddings from multiple pretrained language models and integrate them into $MCA^2$, a multi-view TAD framework. $MCA^2$ adopts a multi-view reconstruction model to effectively extract normal textual patterns from multiple embedding perspectives. To exploit inter-view complementarity, a contrastive collaboration module is designed to leverage and strengthen the interactions across different views. Moreover, an adaptive allocation module is developed to automatically assign the contribution weight of each view, thereby improving the adaptability to diverse datasets. Extensive experiments on 10 benchmark datasets verify the effectiveness of $MCA^2$ against strong baselines. The source code of $MCA^2$ is available at https://github.com/yankehan/MCA2.

Yixin Liu, Kehan Yan, Shiyuan Li, Qingfeng Chen, Shirui Pan• 2026

Related benchmarks

Task	Dataset	Result
Text Anomaly Detection	AGNews	AUPRC93.52	49
Text Anomaly Detection	NLPAD-AGNews	AUROC94.84	25
Text Anomaly Detection	NLPAD-BBCNews	AUROC0.986	25
Text Anomaly Detection	NLPAD MovieReview	AUROC0.8381	25
Text Anomaly Detection	NLPAD-N24News	AUROC96.56	25
Text Anomaly Detection	TAD-EmailSpam	AUROC0.9895	25
Text Anomaly Detection	TAD-OLID	AUROC0.6355	25
Text Anomaly Detection	TAD-HateSpeech	AUROC0.7379	25
Text Anomaly Detection	TAD-CovidFake	AUROC0.9776	25
Text Anomaly Detection	TAD-Liar2	AUROC0.7965	25

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord