Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

KnowCoder-X: Boosting Multilingual Information Extraction via Code

About

Empirical evidence indicates that LLMs exhibit spontaneous cross-lingual alignment. However, although LLMs show promising cross-lingual alignment in Information Extraction (IE), a significant imbalance across languages persists, highlighting an underlying deficiency. To address this, we propose KnowCoder-X, a powerful code LLM with advanced cross-lingual and multilingual capabilities for universal IE. Firstly, it standardizes the representation of multilingual schemas using Python classes, ensuring a consistent ontology across different languages. Then, IE across languages is formulated as a unified code generation task. Secondly, we conduct IE cross-lingual alignment instruction tuning on the translated instance prediction task to enhance the model's cross-lingual transferability. During this phase, we also construct a high-quality and diverse bilingual IE parallel dataset with 257k samples, called ParallelNER, synthesized by our proposed robust three-stage pipeline, with manual annotation to ensure quality. Although without training in 29 unseen languages, KnowCoder-X surpasses ChatGPT by 30.17\% and SoTA by 20.03\%, thereby demonstrating superior cross-lingual IE capabilities. Comprehensive evaluations on 64 IE benchmarks in Chinese and English under various settings demonstrate that KnowCoder-X significantly enhances cross-lingual IE transfer through boosting the IE alignment. Our code and dataset are available at: https://github.com/ICT-GoKnow/KnowCoder

Yuxin Zuo, Wenxuan Jiang, Wenxuan Liu, Zixuan Li, Long Bai, Hanbin Wang, Yutao Zeng, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng• 2024

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionOntoNotes
F1-score87.91
91
Named Entity RecognitionConll 2003
F1 Score94.69
86
Named Entity RecognitionWnut 2017
F1 Score68.72
79
Named Entity RecognitionBC5CDR
F1 Score88.46
59
Named Entity RecognitionMIT Restaurant--
50
Named Entity RecognitionOntoNotes 5--
44
Named Entity RecognitionACE05
F1 Score87.49
38
Named Entity RecognitionGENIA
F1 Score78.97
37
Named Entity RecognitionWikiAnn
F1 Score84.69
32
Named Entity RecognitionMSRA
F1 Score96.01
29
Showing 10 of 65 rows

Other info

Code

Follow for update