ToMMeR -- Efficient Entity Mention Detection from Large Language Models

About

Identifying which text spans refer to entities - mention detection - is both foundational for information extraction and a known performance bottleneck. We introduce ToMMeR, a lightweight model (<300K parameters) probing mention detection capabilities from early LLM layers. Across 13 NER benchmarks, ToMMeR achieves 93% recall zero-shot, with an estimated 90% precision under a human-calibrated LLM-judge protocol, showing that ToMMeR rarely produces spurious predictions despite high recall. Cross-model analysis reveals that diverse architectures (14M-15B parameters) converge on similar mention boundaries (DICE >75%), confirming that mention detection emerges naturally from language modeling. When extended with span classification heads, ToMMeR achieves competitive NER performance (80-87% F1 on standard benchmarks). Our work provides evidence that structured entity representations exist in early transformer layers and can be efficiently recovered with minimal parameters.

Victor Morand, Nadi Tomeh, Josiane Mothe, Benjamin Piwowarski• 2025

Related benchmarks

Task	Dataset	Result
Named Entity Recognition	CoNLL 03	--	140
Named Entity Recognition	multiNERD	Entity F145.5	50
Named Entity Recognition	CrossNER AI	F1 Score64.5	16
Entity Mention Detection	ACE05 (351/80/80)	Precision31.9	14
Named Entity Recognition	GENIA	Micro-F170.1	8
Named Entity Recognition	NCBI	Micro-F182.1	8
Named Entity Recognition	OntoNotes	Micro-F185.4	7
Mention Detection	Conll 2003	Recall94.8	2
Mention Detection	CrossNER Politics	Recall97	2
Mention Detection	CrossNER literature	Recall94.4	2

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord