EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles

About

This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.

Jo\~ao A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton• 2024

Related benchmarks

Task	Dataset	Result
Tool-invocation hijacking to RCE	Agent Security Scenarios Old LLM Lineup	RCE Success Rate10	20
Safety Control	BlackSheep Llama3.2-3B	Safety-Quality Score (P_safeguarded)25.8	17
Safety Control	DialoGPT large	Safety-Quality Score0.077	17
Safety Control	DeepSeek-R1-Distill-Qwen-1.5B	P_safeguarded (Safety-Quality Score)6.9	17
Safety Control	Evil-Alpaca 3B L3.2	Safety-Quality Score (P_safeguarded)44.5	17
Safety Control	Macro Metrics Aggregate across LLMs	Macro-P Safeguarded Safety-Quality Score21.2	17
Tool-invocation hijacking to RCE	Agent Security Scenarios New LLM Lineup	Gemini-3.1-pro RCE Success Rate10	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord