RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

About

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.

David Farr, Nico Manzonelli, Iain Cruickshank, Jevin West• 2024

Related benchmarks

Task	Dataset	Result
Predicting code correctness	LiveCodeBench Python	ECE0.022	60
Code Correctness Prediction	LiveCodeBench Python	Brier Score0.079	60
Code Correctness Prediction	MultiPL-E Java	AUROC0.64	60
Code Correctness Prediction	LiveCodeBench Python	AUROC76.2	60
Code Correctness Prediction	MultiPL-E Java	Brier Score0.378	60
Code Correctness Prediction	MultiPL-E Java	ECE0.375	60
Code correctness classification	LiveSQLBench SQLite	AUROC0.673	55
Predicting code correctness	LiveSQLBench SQLite	Brier Score0.518	55

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord