Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

About

Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only requires a small number of labeled examples, becomes an urgent need. To this end, this paper, for the first time, focuses on the low-resource GEM and proposes a novel low-resource GEM method, termed as PromptEM. PromptEM has addressed three challenging issues (i.e., designing GEM-specific prompt-tuning, improving pseudo-labels quality, and running efficient self-training) in low-resource GEM. Extensive experimental results on eight real benchmarks demonstrate the superiority of PromptEM in terms of effectiveness and efficiency.

Pengfei Wang, Xiaocan Zeng, Lu Chen, Fan Ye, Yuren Mao, Junhao Zhu, Yunjun Gao• 2022

Related benchmarks

TaskDatasetResultRank
Entity MatchingGEO
Precision33.7
9
Entity MatchingMusic 20K
Precision41.1
8
Multi-table Entity MatchingShopee
Precision2.2
7
Entity MatchingMusic-200K
Precision29.4
6
Entity MatchingComputers
F1 Score49.5
5
Entity Matchinggeo-heter
F1 Score78.5
5
Entity MatchingISWC
F1 Score76.4
5
Entity Matchingcameras
F1 Score35.4
5
Showing 8 of 8 rows

Other info

Follow for update