Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
About
Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Extreme Multi-label Classification | Amazon-670K | P@150.11 | 41 | |
| Extreme Multi-label Classification | Amazon-3M | Precision@154.2 | 33 | |
| Extreme Classification | LF-AmazonTitles-131K | P@138.49 | 32 | |
| Extreme Multi-label Classification | Wiki-500K | P@179.4 | 30 | |
| Extreme Multi-label Classification | Wiki10-31K | PSP@112.25 | 21 | |
| Extreme Multi-label Classification | AmazonCat-13K | PSP@150.72 | 21 | |
| Extreme Multi-label Classification | AmazonCat-13K legacy (test) | Precision@10.9679 | 11 | |
| Extreme Multi-label Classification | Wiki10-31K legacy (test) | P@188.69 | 11 | |
| Extreme Multi-label Classification | Amazon-670K large scale XMC (test) | PSP@136.16 | 9 | |
| Extreme Multi-label Classification | Eurlex-4K | Training Time (hours)0.8 | 8 |