Strategies for Span Labeling with Large Language Models

About

Large language models (LLMs) are increasingly used for text analysis tasks, such as named entity recognition or error detection. Unlike encoder-based models, however, generative architectures lack an explicit mechanism to refer to specific parts of their input. This leads to a variety of ad-hoc prompting strategies for span labeling, often with inconsistent results. In this paper, we categorize these strategies into three families: tagging the input text, indexing numerical positions of spans, and matching span content. To address the limitations of content matching, we introduce LogitMatch, a new constrained decoding method that forces the model's output to align with valid input spans. We evaluate all methods across four diverse tasks. We find that while tagging remains a robust baseline, LogitMatch improves upon competitive matching-based methods by eliminating span matching issues and outperforms other strategies in some setups.

Danil Semin, Ond\v{r}ej Du\v{s}ek, Zden\v{e}k Kasner• 2026

Related benchmarks

Task	Dataset	Result
ESA-MT	ESA-MT	Hard F1 Score14.2	40
Entity-aware Sentence Alignment	ESA-MT	Soft F126.8	40
Named Entity Recognition	NER	Soft F176.5	40
Named Entity Recognition	NER	Hard F1 Score72.4	40
Common Phrase Labeling	CPL	Soft F175.2	40
CPL	CPL	Hard F1 Score75.2	40
Grammar Error Correction	GEC	Soft F133.8	40
Grammatical Error Correction	GEC	Hard F1 Score24.3	40

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord