LMK > CLS: Landmark Pooling for Dense Embeddings
About
Representation learning is central to many downstream tasks such as search, clustering, classification, and reranking. State-of-the-art sequence encoders typically collapse a variable-length token sequence to a single vector using a pooling operator, most commonly a special [CLS] token or mean pooling over token embeddings. In this paper, we identify systematic weaknesses of these pooling strategies: [CLS] tends to concentrate information toward the initial positions of the sequence and can under-represent distributed evidence, while mean pooling can dilute salient local signals, sometimes leading to worse short-context performance. To address these issues, we introduce Landmark (LMK) pooling, which partitions a sequence into chunks, inserts landmark tokens between chunks, and forms the final representation by mean-pooling the landmark token embeddings. This simple mechanism improves long-context extrapolation without sacrificing local salient features, at the cost of introducing a small number of special tokens. We empirically demonstrate that LMK pooling matches existing methods on short-context retrieval tasks and yields substantial improvements on long-context tasks, making it a practical and scalable alternative to existing pooling methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Information Retrieval | BEIR | -- | 59 | |
| Document Retrieval | MsMARCO (dev) | NDCG@1043.2 | 41 | |
| Document Retrieval | MIRACL (dev) | NDCG@1049 | 41 | |
| Document Retrieval | COIR | NDCG@1047 | 35 | |
| Retrieval | MLDR (test) | NDCG@1035 | 34 | |
| Document Retrieval | LongEmbed 6 | NDCG@1070.7 | 29 | |
| Information Retrieval | MTEB v2 | NDCG@1045.9 | 28 | |
| Document Retrieval | BEIR 15 | NDCG@100.443 | 21 | |
| Multilingual Long Document Retrieval | MLDR 13 (test) | NDCG@1038.7 | 18 | |
| Information Retrieval | LongEmbed | NDCG@1062.6 | 14 |