Universal Information Extraction as Unified Semantic Matching

About

The challenge of information extraction (IE) lies in the diversity of label schemas and the heterogeneity of structures. Traditional methods require task-specific model design and rely heavily on expensive supervision, making them difficult to generalize to new schemas. In this paper, we decouple IE into two basic abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching (USM) framework, which introduces three unified token linking operations to model the abilities of structuring and conceptualizing. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand. Empirical evaluation on 4 IE tasks shows that the proposed method achieves state-of-the-art performance under the supervised experiments and shows strong generalization ability in zero/few-shot transfer settings.

Jie Lou, Yaojie Lu, Dai Dai, Wei Jia, Hongyu Lin, Xianpei Han, Le Sun, Hua Wu• 2023

Related benchmarks

Task	Dataset	Result
Named Entity Recognition	CoNLL 03	F1 (Entity)93.16	135
Relation Extraction	SciERC	Relation Strict F137.4	68
Relation Extraction	CoNLL 04	F178.8	59
Relation Extraction	CONLL04	Relation Strict F178.84	52
Named Entity Recognition	ACE 2005	Entity F187.14	42
Named Entity Recognition	Cross-domain NER datasets out-of-domain	AI NER Score28.2	23
Event Argument Extraction	ACE 2005	F1 Score55.83	21
Event Argument Extraction	CASIE	F1 Score63.26	17
Event Detection	ACE 2005 (test)	F1 Score69.3	15
Named Entity Recognition	CrossNER AI	--	12

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord