Accelerating BERT Inference for Sequence Labeling via Early-Exit
About
Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%-75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2X, 3X, and 4X.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | F1 Score64.11 | 27 | ||
| Named Entity Recognition | OntoNotes 4.0 | F1 Score78.98 | 18 | |
| Named Entity Recognition | Twitter NER | F1 Score77.77 | 14 | |
| Chinese Word Segmentation | CTB Seg 5 | F1 Score98.46 | 3 | |
| POS Tagging | ARK Twitter | Accuracy91.38 | 3 | |
| Chinese Word Segmentation | UD Seg | F1 Score0.9751 | 2 | |
| Named Entity Recognition | CLUE NER | F1 Score75.95 | 2 | |
| POS Tagging | CTB POS 5 | F1 Score94.91 | 2 | |
| POS Tagging | UD POS | F1 Score91.01 | 2 |