InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
About
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | SNLI | Accuracy93.3 | 174 | |
| Text Classification | AGNews | Clean Accuracy94.81 | 118 | |
| Natural Language Inference | MNLI | Accuracy (matched)90.7 | 80 | |
| Text Classification | IMDB (test) | CA92 | 79 | |
| Sentiment Analysis | SST-2 (test) | Clean Accuracy92.9 | 50 | |
| Sentiment Analysis | IMDB (test) | Clean Accuracy (%)94.18 | 37 | |
| Text Classification | IMDB | Clean Accuracy95.2 | 32 | |
| Natural Language Inference | ANLI (test) | Overall Score58.3 | 28 | |
| Natural Language Inference | QNLI (test) | -- | 27 | |
| Text Classification | AGNews (test) | Accuracy (Clean)95.5 | 15 |