Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes

About

Large Language Model (LLM)-augmented Community Notes offer a scalable path for timely, evidence-grounded correction of health misinformation on social platforms. However, they still reset at every post, leaving useful correction experience from prior cases unused. We introduce EvoNote, an agentic framework that enables health Community Notes generation to self-evolve through an evolving experience memory of prior misinformation correction episodes. Its core is fine-grained credit assignment: EvoNote grounds trajectory-level feedback in health-specific note qualities and distills it into action-level memory for claim analysis, evidence acquisition, and note writing. We evaluate EvoNote on MM-HealthCN, a 1.2K-instance multimodal benchmark of user-flagged health posts with human-written Community Notes and crowd-derived helpfulness labels. Under a human-validated hierarchical utility judge, EvoNote-generated notes are preferred over corresponding human-written notes in 89.6% of cases; on a separate set of Needs More Ratings posts without a crowd helpfulness verdict, EvoNote produces helpful notes for 82.0% of cases. It also reduces the median time needed to produce a candidate correction from over 13 hours in the human-note pipeline to under 2 minutes. Analyses link these gains to stronger evidence use and reusable correction strategies, positioning self-evolving note generation as a promising paradigm for health misinformation governance.

Zihang Fu, Fanxiao Li, Jianyang Gu, Haonan Wang, Preslav Nakov, Bryan Hooi, Min-Yen Kan, Jiaying Wu• 2026

Related benchmarks

TaskDatasetResultRank
Note GenerationMM-HEALTHCN Text
Relevance (R)97.5
12
Note GenerationMM-HEALTHCN Image
Relevance (R)96.5
12
Note GenerationMM-HEALTHCN Video
Relevance (R)97
12
Note GenerationMM-HEALTHCN (Overall)
Note Helpfulness (H)81.17
12
Community Note Effectiveness EvaluationNMR Text (Needs More Ratings)
Helpfulness82
10
Community Note Effectiveness EvaluationNMR Image (Needs More Ratings)
Relevance100
5
Community Note Effectiveness EvaluationNMR Video
Relevance96
5
Showing 7 of 7 rows

Other info

Follow for update