Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making

About

Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods require many resources for training, and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained language modeling, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on 6 public datasets and 3 industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases. Our codes and datasets can be obtained from https://github.com/THU-KEG/HIF-KAT.

Zijun Yao, Chengjiang Li, Tiansi Dong, Xin Lv, Jifan Yu, Lei Hou, Juanzi Li, Yichi Zhang, Zelin Dai• 2021

Related benchmarks

TaskDatasetResultRank
Entity MatchingI-A1 10% labeled (test)
F1 Score96
13
Entity MatchingD-A1 1% labeled train (test)
F1 Score96.6
13
Entity MatchingD-S1 1% labeled train (test)
F1 Score88.2
13
Entity MatchingI-A2 10% labeled train (test)
F1 Score54.9
13
Entity MatchingD-S2 1% labeled train data (test)
F1 Score79.5
13
Entity MatchingPhone 10% labeled train (test)
F1 Score0.949
13
Entity MatchingSkirt 1% labeled (test)
F1 Score96.7
13
Entity MatchingToner 1% labeled train data (test)
F1 Score97.2
13
Entity MatchingD-A2 1% labeled train (test)
F1 Score80.3
11
Entity MatchingiTunes-Amazon Structured I-A1
F1 Score95.9
7
Showing 10 of 18 rows

Other info

Code

Follow for update