Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fine-grained Fallacy Detection with Human Label Variation

About

We introduce Faina, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. Faina includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian about migration, climate change, and public health given by two expert annotators. Through an extensive annotation study that allowed discussion over multiple rounds, we minimize annotation errors whilst keeping signals of human label variation. Moreover, we devise a framework that goes beyond "single ground truth" evaluation and simultaneously accounts for multiple (equally reliable) test sets and the peculiarities of the task, i.e., partial span matches, overlaps, and the varying severity of labeling errors. Our experiments across four fallacy detection setups show that multi-task and multi-label transformer-based approaches are strong baselines across all settings. We release our data, code, and annotation guidelines to foster research on fallacy detection and human label variation more broadly.

Alan Ramponi, Agnese Daffara, Sara Tonelli• 2025

Related benchmarks

TaskDatasetResultRank
Common Phrase LabelingCPL
Soft F179.9
40
CPLCPL
Hard F1 Score79.9
40
Grammar Error CorrectionGEC
Soft F131.8
40
Grammatical Error CorrectionGEC
Hard F1 Score23.2
40
Named Entity RecognitionNER
Soft F175.4
40
Named Entity RecognitionNER
Hard F1 Score69.1
40
ESA-MTESA-MT
Hard F1 Score10
40
Entity-aware Sentence AlignmentESA-MT
Soft F121.2
40
Showing 8 of 8 rows

Other info

Follow for update