DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition
About
Large language models (LLMs) have advanced information extraction (IE) by enabling zero-shot and few-shot named entity recognition (NER), yet their generative outputs still show persistent and systematic errors. Despite progress through instruction fine-tuning, zero-shot NER still lags far behind supervised systems. These recurring errors mirror inconsistencies observed in early-stage human annotation processes that resolve disagreements through pilot annotation. Motivated by this analogy, we introduce DiZiNER (Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition), a framework that simulates the pilot annotation process, employing LLMs to act as both annotators and supervisors. Multiple heterogeneous LLMs annotate shared texts, and a supervisor model analyzes inter-model disagreements to refine task instructions. Across 18 benchmarks, DiZiNER achieves zero-shot SOTA results on 14 datasets, improving prior bests by +8.0 F1 and reducing the zero-shot to supervised gap by over +11 points. It also consistently outperforms its supervisor, GPT-5 mini, indicating that improvements stem from disagreement-guided instruction refinement rather than model capacity. Pairwise agreement between models shows a strong correlation with NER performance, further supporting this finding.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | CoNLL 03 | F1 (Entity)86.9 | 135 | |
| Named Entity Recognition | OntoNotes | F1-score62.5 | 121 | |
| Named Entity Recognition | BC5CDR | F1 Score78.9 | 102 | |
| Named Entity Recognition | MIT Movie | Entity F176.2 | 71 | |
| Named Entity Recognition | GENIA | F1 Score60.1 | 58 | |
| Named Entity Recognition | MIT Restaurant | -- | 57 | |
| Named Entity Recognition | multiNERD | Entity F180.6 | 50 | |
| Named Entity Recognition | bc2gm | Entity F171 | 48 | |
| Named Entity Recognition | ACE05 | F1 Score45 | 48 | |
| Named Entity Recognition | FabNER | Entity F129.5 | 45 |