XNote: Benchmarking Automated Community Notes Generation for Image-based Contextual Deception
About
Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception on social media platforms. However, its reliance on human contributors limits both the timeliness and scalability. In this work, we study the automated Community Notes generation task for image-based contextual deception, where an authentic image is paired with misleading context (e.g., time, entity, and event). Unlike prior work that primarily focuses on deception detection (i.e., judging whether a post is true or false in a binary manner), automated Community Notes generation requires producing concise and grounded notes that help users recover the missing or corrected context. This problem remains underexplored due to the scarcity of datasets that support this task. To address this gap, we curate a real-world dataset, XNote, comprising X posts with associated Community Notes and external contexts, along with annotations of topics and deceptive factors. We further benchmark a range of frontier large vision language models (LVLMs) on XNote, evaluating their performance on both deception detection and note generation tasks. We also compare against an end-to-end approach, SNIFFER, and a commercial tool, GPT-5. Our results highlight the challenges in automated Community Notes generation, underscoring the need for improved methods and metrics tailored for this task.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deception Detection | XCHECK 1.0 (test) | F1 Score87.44 | 28 | |
| Note Generation | XCHECK 1.0 (test) | R-L0.1507 | 28 |