Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes
About
Large Language Model (LLM)-augmented Community Notes offer a scalable path for timely, evidence-grounded correction of health misinformation on social platforms. However, they still reset at every post, leaving useful correction experience from prior cases unused. We introduce EvoNote, an agentic framework that enables health Community Notes generation to self-evolve through an evolving experience memory of prior misinformation correction episodes. Its core is fine-grained credit assignment: EvoNote grounds trajectory-level feedback in health-specific note qualities and distills it into action-level memory for claim analysis, evidence acquisition, and note writing. We evaluate EvoNote on MM-HealthCN, a 1.2K-instance multimodal benchmark of user-flagged health posts with human-written Community Notes and crowd-derived helpfulness labels. Under a human-validated hierarchical utility judge, EvoNote-generated notes are preferred over corresponding human-written notes in 89.6% of cases; on a separate set of Needs More Ratings posts without a crowd helpfulness verdict, EvoNote produces helpful notes for 82.0% of cases. It also reduces the median time needed to produce a candidate correction from over 13 hours in the human-note pipeline to under 2 minutes. Analyses link these gains to stronger evidence use and reusable correction strategies, positioning self-evolving note generation as a promising paradigm for health misinformation governance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Note Generation | MM-HEALTHCN Text | Relevance (R)97.5 | 12 | |
| Note Generation | MM-HEALTHCN Image | Relevance (R)96.5 | 12 | |
| Note Generation | MM-HEALTHCN Video | Relevance (R)97 | 12 | |
| Note Generation | MM-HEALTHCN (Overall) | Note Helpfulness (H)81.17 | 12 | |
| Community Note Effectiveness Evaluation | NMR Text (Needs More Ratings) | Helpfulness82 | 10 | |
| Community Note Effectiveness Evaluation | NMR Image (Needs More Ratings) | Relevance100 | 5 | |
| Community Note Effectiveness Evaluation | NMR Video | Relevance96 | 5 |