Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LlamBERT: Large-scale low-cost data annotation in NLP

About

Large Language Models (LLMs), such as GPT-4 and Llama 2, show remarkable proficiency in a wide range of natural language processing (NLP) tasks. Despite their effectiveness, the high costs associated with their use pose a challenge. We present LlamBERT, a hybrid approach that leverages LLMs to annotate a small subset of large, unlabeled databases and uses the results for fine-tuning transformer encoders like BERT and RoBERTa. This strategy is evaluated on two diverse datasets: the IMDb review dataset and the UMLS Meta-Thesaurus. Our results indicate that the LlamBERT approach slightly compromises on accuracy while offering much greater cost-effectiveness.

B\'alint Csan\'ady, Lajos Muzsai, P\'eter Vedres, Zolt\'an N\'adasdy, Andr\'as Luk\'acs• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationMNIST
Accuracy99.87
395
Sentiment AnalysisIMDB (test)
Accuracy96.68
248
Image ClassificationFashion MNIST
Accuracy96.91
225
UMLS classificationUMLS (test)
Accuracy96.92
9
Showing 4 of 4 rows

Other info

Code

Follow for update