Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge Graph Prompting for Multi-Document Question Answering

About

The `pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or intra-document structural relations. For graph traversal, we design an LLM-based graph traversal agent that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the graph traversal agent acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design for LLMs. Our code: https://github.com/YuWVandy/KG-LLM-MDQA.

Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr• 2023

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM36.8
387
Multi-hop Question AnsweringHotpotQA
F1 Score58.73
294
Question AnsweringNQ (test)--
86
Question Answering2WikiMultiHopQA (test)--
81
Multi-hop Question AnsweringMulti-hop RAG--
77
Question AnsweringMixed Dataset (NQ, PopQA, HotpotQA, 2Wiki) (test)
Accuracy53.1
14
Graph ReasoningG-bench CS
Inference Time (s)89.4
9
Reasoning explanation generationG-bench CS (dev)
Average R58.7
7
Showing 8 of 8 rows

Other info

Follow for update