ToMMeR -- Efficient Entity Mention Detection from Large Language Models
About
Identifying which text spans refer to entities - mention detection - is both foundational for information extraction and a known performance bottleneck. We introduce ToMMeR, a lightweight model (<300K parameters) probing mention detection capabilities from early LLM layers. Across 13 NER benchmarks, ToMMeR achieves 93% recall zero-shot, with an estimated 90% precision under a human-calibrated LLM-judge protocol, showing that ToMMeR rarely produces spurious predictions despite high recall. Cross-model analysis reveals that diverse architectures (14M-15B parameters) converge on similar mention boundaries (DICE >75%), confirming that mention detection emerges naturally from language modeling. When extended with span classification heads, ToMMeR achieves competitive NER performance (80-87% F1 on standard benchmarks). Our work provides evidence that structured entity representations exist in early transformer layers and can be efficiently recovered with minimal parameters.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | CoNLL 03 | -- | 135 | |
| Named Entity Recognition | multiNERD | Entity F145.5 | 50 | |
| Entity Mention Detection | ACE05 (351/80/80) | Precision31.9 | 14 | |
| Named Entity Recognition | CrossNER AI | F1 Score64.5 | 12 | |
| Named Entity Recognition | GENIA | Micro-F170.1 | 8 | |
| Named Entity Recognition | NCBI | Micro-F182.1 | 8 | |
| Named Entity Recognition | OntoNotes | Micro-F185.4 | 7 | |
| Mention Detection | Conll 2003 | Recall94.8 | 2 | |
| Mention Detection | CrossNER Politics | Recall97 | 2 | |
| Mention Detection | CrossNER literature | Recall94.4 | 2 |