Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DATE: Detecting Anomalies in Text via Self-Supervision of Transformers

About

Leveraging deep learning models for Anomaly Detection (AD) has seen widespread use in recent years due to superior performances over traditional methods. Recent deep methods for anomalies in images learn better features of normality in an end-to-end self-supervised setting. These methods train a model to discriminate between different transformations applied to visual data and then use the output to compute an anomaly score. We use this approach for AD in text, by introducing a novel pretext task on text sequences. We learn our DATE model end-to-end, enforcing two independent and complementary self-supervision signals, one at the token-level and one at the sequence-level. Under this new task formulation, we show strong quantitative and qualitative results on the 20Newsgroups and AG News datasets. In the semi-supervised setting, we outperform state-of-the-art results by +13.5% and +6.9%, respectively (AUROC). In the unsupervised configuration, DATE surpasses all other methods even when 10% of its training data is contaminated with outliers (compared with 0% for the others).

Andrei Manolache, Florin Brad, Elena Burceanu• 2021

Related benchmarks

TaskDatasetResultRank
Text Anomaly DetectionTAD-SMSSpam
AUROC0.967
25
Text Anomaly DetectionTAD-EmailSpam
AUROC0.9638
25
Text Anomaly DetectionTAD-HateSpeech
AUROC0.6009
25
Text Anomaly DetectionAGNews
AUPRC33.32
25
Text Anomaly DetectionTAD-Liar2
AUROC0.69
25
Text Anomaly DetectionNLPAD-AGNews
AUROC78.43
25
Text Anomaly DetectionNLPAD-N24News
AUROC66.09
25
Text Anomaly DetectionNLPAD-BBCNews
AUROC0.8026
25
Text Anomaly DetectionNLPAD MovieReview
AUROC0.4871
25
Text Anomaly DetectionTAD-OLID
AUROC0.5194
25
Showing 10 of 15 rows

Other info

Code

Follow for update