Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation
About
Community Notes, the crowd-sourced misinformation governance system on X (formerly Twitter), allows users to flag misleading posts, attach contextual notes, and rate the notes' helpfulness. However, our empirical analysis of 30.8K health-related notes reveals substantial latency, with a median delay of 17.6 hours before notes receive a helpfulness status. To improve responsiveness during real-world misinformation surges, we propose CrowdNotes+, a unified LLM-based framework that augments Community Notes for faster and more reliable health misinformation governance. CrowdNotes+ integrates two modes: (1) evidence-grounded note augmentation and (2) utility-guided note automation, supported by a hierarchical three-stage evaluation of relevance, correctness, and helpfulness. We instantiate the framework with HealthNotes, a benchmark of 1.2K health notes annotated for helpfulness, and a fine-tuned helpfulness judge. Our analysis first uncovers a key loophole in current crowd-sourced governance: voters frequently conflate stylistic fluency with factual accuracy. Addressing this via our hierarchical evaluation, experiments across 15 representative LLMs demonstrate that CrowdNotes+ significantly outperforms human contributors in note correctness, helpfulness, and evidence utility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Note Generation | MM-HEALTHCN Text | Relevance (R)88.5 | 12 | |
| Note Generation | MM-HEALTHCN Image | Relevance (R)87.5 | 12 | |
| Note Generation | MM-HEALTHCN Video | Relevance (R)87.75 | 12 | |
| Note Generation | MM-HEALTHCN (Overall) | Note Helpfulness (H)64.92 | 12 | |
| Community Note Effectiveness Evaluation | NMR Text (Needs More Ratings) | Helpfulness66 | 10 | |
| Community Note Effectiveness Evaluation | NMR Image (Needs More Ratings) | Relevance95 | 5 | |
| Community Note Effectiveness Evaluation | NMR Video | Relevance94 | 5 |