Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Light-Weight Translation Models from Deep Transformer

About

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.

Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu• 2020

Related benchmarks

TaskDatasetResultRank
UAV TrackingVisDrone 2018
Precision83.4
32
UAV TrackingUAVDT
Precision77.3
32
UAV TrackingDTB70
Precision0.726
32
Visual Object TrackingUAV123
SUC47.7
25
Object TrackingAverage DTB70, UAVDT, VisDrone2018, UAV123
Precision75.9
17
Showing 5 of 5 rows

Other info

Follow for update