Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BAE: BERT-based Adversarial Examples for Text Classification

About

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.

Siddhant Garg, Goutham Ramakrishnan• 2020

Related benchmarks

TaskDatasetResultRank
Question ClassificationTREC
Accuracy97.6
205
Counterfactual GenerationSNLI Hypothesis
LFR66.5
37
Counterfactual GenerationSNLI Premise
LFR0.518
37
Counterfactual GenerationAG-News
LFR0.443
37
Counterfactual GenerationIMDB
LFR63.7
37
Text ClassificationEmotion
ASR (%)0.3295
36
Counterfactual GenerationSST2 (test)
SLFR47
29
Counterfactual GenerationAG News (test)
SLFR19.5
29
Question AnsweringHotpotQA (train test)
BLEU50.99
4
Question AnsweringTruthfulQA (train test)
BLEU0.5167
4
Showing 10 of 12 rows

Other info

Follow for update