Knowledge Graph Prompting for Multi-Document Question Answering

About

The `pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or intra-document structural relations. For graph traversal, we design an LLM-based graph traversal agent that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the graph traversal agent acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design for LLMs. Our code: https://github.com/YuWVandy/KG-LLM-MDQA.

Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr• 2023

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM36.8	559
Multi-hop Question Answering	HotpotQA	F1 Score58.73	294
Question Answering	NQ (test)	--	133
Question Answering	2WikiMultiHopQA (test)	--	113
Multi-hop Question Answering	Multi-hop RAG	--	77
Multi-hop Question Answering	HotpotQA	LLM Judge Score62.1	72
Retrieval-Augmented Generation	All Datasets Aggregated	Average Performance Score49.3	55
Multi-hop Question Answering	MuSiQue	String Accuracy28.4	44
Multi-hop Question Answering	2WikiMultihopQA	String Accuracy47.5	44
Multi-hop Question Answering	Multi-Hop QA	2Wiki Accuracy38.6	37

Showing 10 of 33 rows

Other info

Follow for update

@wizwand_team Discord