Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

About

Entity resolution (ER) is an important data integration task with a wide spectrum of applications. The state-of-the-art solutions on ER rely on pre-trained language models (PLMs), which require fine-tuning on a lot of labeled matching/non-matching entity pairs. Recently, large languages models (LLMs), such as GPT-4, have shown the ability to perform many tasks without tuning model parameters, which is known as in-context learning (ICL) that facilitates effective learning from a few labeled input context demonstrations. However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs. To address the problem, in this paper, we provide a comprehensive study to investigate how to develop a cost-effective batch prompting approach to ER. We introduce a framework BATCHER consisting of demonstration selection and question batching and explore different design choices that support batch prompting for ER. We also devise a covering-based demonstration selection strategy that achieves an effective balance between matching accuracy and monetary cost. We conduct a thorough evaluation to explore the design space and evaluate our proposed strategies. Through extensive experiments, we find that batch prompting is very cost-effective for ER, compared with not only PLM-based methods fine-tuned with extensive labeled data but also LLM-based methods with manually designed prompting. We also provide guidance for selecting appropriate design choices for batch prompting.

Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du• 2023

Related benchmarks

TaskDatasetResultRank
Entity ResolutionAlaska
FP54.28
6
Entity Resolutionmusic
FP58.38
6
Entity ResolutionMovies
FP43.76
6
Entity ResolutionCensus
FP Rate62.77
6
Entity ResolutionCora
FP Rate78.46
6
Entity ResolutionAS
False Positives53.36
6
Entity ResolutionAmazon-GP
FP67.69
6
Entity ResolutionSONG
FP Score63.41
6
Showing 8 of 8 rows

Other info

Follow for update