Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution

About

Dirty entity resolution (ER), which identifies records referring to the same real-world entity from a single, messy dataset, is a fundamental task in data management and mining. However, the dominant blocking-matching-clustering paradigm for ER suffers from critical flaws. Its cascaded, decoupled workflow essentially produces a static, sparse graph plagued by missing edges (due to blocking failures) and noisy links (due to matching errors), causing error propagation and yielding suboptimal clusters, particularly when rigid transitivity is imposed in the clustering. We contend that matching and clustering are fundamentally synergistic, both optimizing for the construction of an ideal entity graph. Building upon this insight, we propose Alper, a unified framework that integrates these steps into an iterative probabilistic label propagation process over a global, evolving graph. Unlike disjoint blocking, Alper refines the graph structure and labels dynamically by adaptively integrating "weak but cheap" signals from graph propagation with "strong but expensive" LLM-based pairwise queries. For higher cost-effectiveness, we formulate the signal selection as a constrained optimization problem maximizing cumulative marginal gain under a query budget, solved via our greedy algorithm with provable theoretical guarantees. Our extensive experiments over eight benchmark datasets demonstrate that Alper is consistently superior to state-of-the-art cascaded pipelines.

Hongtao Wang, Renchi Yang, Haoran Zheng, Xiangyu Ke• 2026

Related benchmarks

TaskDatasetResultRank
Entity ResolutionSONG
FP Score91.12
6
Entity ResolutionCensus
FP Rate82.15
6
Entity ResolutionCora
FP Rate86.34
6
Entity ResolutionAS
False Positives71.71
6
Entity ResolutionAmazon-GP
FP87.38
6
Entity ResolutionAlaska
FP79.58
6
Entity Resolutionmusic
FP77.26
6
Entity ResolutionMovies
FP64.68
6
Entity ResolutionCora
FP Rate86.34
5
Entity ResolutionAlaska
False Positives79.58
5
Showing 10 of 18 rows

Other info

Follow for update