Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

About

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.

David Farr, Nico Manzonelli, Iain Cruickshank, Jevin West• 2024

Related benchmarks

TaskDatasetResultRank
Predicting code correctnessLiveCodeBench Python
ECE0.022
60
Code Correctness PredictionLiveCodeBench Python
Brier Score0.079
60
Code Correctness PredictionMultiPL-E Java
AUROC0.64
60
Code Correctness PredictionLiveCodeBench Python
AUROC76.2
60
Code Correctness PredictionMultiPL-E Java
Brier Score0.378
60
Code Correctness PredictionMultiPL-E Java
ECE0.375
60
Code correctness classificationLiveSQLBench SQLite
AUROC0.673
55
Predicting code correctnessLiveSQLBench SQLite
Brier Score0.518
55
Showing 8 of 8 rows

Other info

Follow for update