Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
About
State-of-the-art natural language understanding classification models follow two-stages: pre-training a large language model on an auxiliary task, and then fine-tuning the model on a task-specific labeled dataset using cross-entropy loss. However, the cross-entropy loss has several shortcomings that can lead to sub-optimal generalization and instability. Driven by the intuition that good generalization requires capturing the similarity between examples in one class and contrasting them with examples in other classes, we propose a supervised contrastive learning (SCL) objective for the fine-tuning stage. Combined with cross-entropy, our proposed SCL loss obtains significant improvements over a strong RoBERTa-Large baseline on multiple datasets of the GLUE benchmark in few-shot learning settings, without requiring specialized architecture, data augmentations, memory banks, or additional unsupervised data. Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data, and can generalize better to related tasks with limited labeled data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 | Top-1 Accuracy81.49 | 622 | |
| Image Classification | DTD | Accuracy72.73 | 487 | |
| Image Classification | CIFAR-10 | -- | 471 | |
| Image Classification | Aircraft | Accuracy87.44 | 302 | |
| Image Classification | Oxford-IIIT Pets | Accuracy89.71 | 259 | |
| Image Classification | Caltech-101 | Accuracy92.84 | 198 | |
| Image Classification | FGVC Aircraft | Top-1 Accuracy87.44 | 185 | |
| Emotion Recognition in Conversation | MELD | Weighted Avg F165.63 | 137 | |
| Conversational Emotion Recognition | IEMOCAP | Weighted Average F1 Score68.14 | 129 | |
| Image Classification | Flowers | Top-1 Acc98.65 | 80 |