Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiview Identifiers Enhanced Generative Retrieval

About

Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.

Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li• 2023

Related benchmarks

TaskDatasetResultRank
Information RetrievalNQ320k
Hits@131
54
Document RetrievalMS MARCO MS300K (test)
MRR@1042.51
36
Showing 2 of 2 rows

Other info

Follow for update