DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
About
This paper shows how diffusion language models (DLMs) can be used as effective and efficient retrievers. Existing DLM-based retrievers (e.g., DiffEmbed) follow BERT-style encoding, representing each query or passage as a single mean-pooled vector. This ignores how DLMs are trained to generate responses through masked-position prediction under bidirectional attention, a capability that can provide stronger retrieval signals. We propose DiffRetriever, which uses the DLM's native masked-position prediction directly for retrieval. For each query or passage, DiffRetriever appends one or more masked positions, using the outputs as retrieval representations in a single forward pass. With one masked position, single-representation DiffRetriever already improves over DiffEmbed on the same backbones. DiffRetriever also naturally extends to multi-representation retrieval: DLMs process multiple masked positions jointly, enabling ColBERT-style fine-grained matching with little additional encoding latency. In autoregressive LLM retrievers, the same multi-representation strategy requires sequential decoding and therefore incurs much higher latency. DiffRetriever obtains the strongest aggregate effectiveness within our matched comparison, outperforming DiffEmbed, PromptReps, and RepLLaMA. Masked-position counts selected on training data transfer well across datasets, while per-query variation suggests headroom for adaptive allocation. Code is available at https://github.com/ielab/diffretriever.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Retrieval | TREC DL 2019 | -- | 83 | |
| Information Retrieval | COVID | -- | 50 | |
| Information Retrieval | TREC DL 2020 | -- | 33 | |
| Information Retrieval | NQ | NDCG@10 (Dense)64.4 | 21 | |
| Information Retrieval | FiQA | NDCG@10 (Dense)0.479 | 21 | |
| Information Retrieval | Quora | NDCG@10 (Dense)88.7 | 21 | |
| Information Retrieval | BEIR-7 Average | NDCG@10 (Dense)67.1 | 21 | |
| Retrieval | MS Marco | D (MRR@10)0.433 | 21 | |
| Information Retrieval | SciFact | NDCG@10 (Dense)75.2 | 21 | |
| Information Retrieval | ArguAna | NDCG@10 (Dense)41.4 | 21 |