Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards Robustness Against Natural Language Word Substitutions

About

Robustness against word substitutions has a well-defined and widely acceptable form, i.e., using semantically similar words as substitutions, and thus it is considered as a fundamental stepping-stone towards broader robustness in natural language processing. Previous defense methods capture word substitutions in vector space by using either $l_2$-ball or hyper-rectangle, which results in perturbation sets that are not inclusive enough or unnecessarily large, and thus impedes mimicry of worst cases for robust training. In this paper, we introduce a novel \textit{Adversarial Sparse Convex Combination} (ASCC) method. We model the word substitution attack space as a convex hull and leverages a regularization term to enforce perturbation towards an actual substitution, thus aligning our modeling better with the discrete textual space. Based on the ASCC method, we further propose ASCC-defense, which leverages ASCC to generate worst-case perturbations and incorporates adversarial training towards robustness. Experiments show that ASCC-defense outperforms the current state-of-the-arts in terms of robustness on two prevailing NLP tasks, \emph{i.e.}, sentiment analysis and natural language inference, concerning several attacks across multiple model architectures. Besides, we also envision a new class of defense towards robustness in NLP, where our robustly trained word vectors can be plugged into a normally trained model and enforce its robustness without applying any other defense techniques.

Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu• 2021

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceSNLI (test)
Accuracy87.1
681
Natural Language InferenceSNLI
Accuracy87.1
174
Text ClassificationYahoo! Answers (test)
Clean Accuracy70.7
133
Text ClassificationIMDB (test)
CA87.8
79
Sentiment ClassificationIMDB
Accuracy80.1
41
Sentiment AnalysisIMDB (test)
Clean Accuracy (%)92.62
37
Rumor DetectionPheme
DeepWordBug ASR44.53
16
Harmful Content DetectionPHEME New Attacks: ExplainDrive (test)
Accuracy79.88
15
Sentiment AnalysisIMDB (test)
Genetic Score74.8
10
Harmful Content DetectionPHEME Known Attacks: DeepWordBug, TFAdjusted, TREPAT (test)
Accuracy81.15
10
Showing 10 of 11 rows

Other info

Follow for update