The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators

About

Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500x.

Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	Waterbirds (test)	Worst-Group Accuracy46.7	127
Sentiment Analysis	IMDB	Accuracy66.2	73
High-stakes specialized classification	ChemProt (test)	Macro F138.03	49
General classification	Banking77 (test)	Macro F127.59	49
Text Classification	SMS	--	45
Complex Reasoning	VitaminC (test)	Macro-F172.67	37
Multi-label biomedical classification	PubMed (test)	Macro-F156.84	37
General classification	AGNews (test)	Macro F187.05	37
High-stakes specialized classification	DDI (test)	Macro-F127.16	37
High-stakes specialized classification	Claude9 (test)	Macro F132.4	37

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord