Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment

About

Assessing the veracity of online content has become increasingly critical. Large language models (LLMs) have recently enabled substantial progress in automated veracity assessment, including automated fact-checking and claim verification systems. Typical veracity assessment pipelines break down complex claims into sub-claims, retrieve external evidence, and then apply LLM reasoning to assess veracity. However, existing methods often treat evidence retrieval as a static, isolated step and do not effectively manage or reuse retrieved evidence across claims. In this work, we propose MERMAID, a memory-enhanced multi-agent veracity assessment framework that tightly couples the retrieval and reasoning processes. MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process, enabling dynamic evidence acquisition and cross-claim evidence reuse. By retaining retrieved evidence in an evidence memory, the framework reduces redundant searches and improves verification efficiency and consistency. We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs, including GPT, LLaMA, and Qwen families. Experimental results show that MERMAID achieves state-of-the-art performance while improving the search efficiency, demonstrating the effectiveness of synergizing retrieval, reasoning, and memory for reliable veracity assessment.

Yupeng Cao, Chengyang He, Yangyang Yu, Ping Wang, K.P. Subbalakshmi• 2026

Related benchmarks

TaskDatasetResultRank
Veracity AssessmentFactCheck-Bench
Macro-F177
26
Scientific Fact VerificationSciFact
Macro F10.7
16
Veracity AssessmentFacTool-QA
True F192
12
Veracity AssessmentBingCheck
True F10.88
12
Multi-hop Fact VerificationHOVER 3-hop
Macro F158
7
Multi-hop Fact VerificationHOVER 4-hop
Macro-F163
7
Multi-hop Fact VerificationHOVER 2-hop
Macro F171
7
Showing 7 of 7 rows

Other info

Follow for update