WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification
About
Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative re-annotation framework for stabilizing multilingual speaker-attribute labels under practical resource constraints. Starting from a noisy corpus, we use LLMs to surface recurring annotation rationales through iterative interaction with experts, and apply disagreement-focused sampling for targeted re-annotation. Using this framework, we construct WhoSaidIt, a multilingual dataset covering nine speaker-attribute labels. We quantify divergence between original and revised annotations, benchmark recent LLMs, and analyze the effect of explicit rationales on model behavior. Our results reveal substantial cross-lingual differences in annotation decisions and demonstrate both the strengths and limitations of LLMs in speaker-attribute classification.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Attribute Classification | WhoSaidIt public release | Accuracy (EN)100 | 10 | |
| Speaker-attribute classification | Intermediate Dataset original corpus labels (test) | -- | 9 | |
| Speaker-attribute classification | WhoSaidIt re-annotated (test) | -- | 8 | |
| Speaker-attribute classification | WhoSaidIt English public release | Accuracy (Male)100 | 1 | |
| Speaker-attribute classification | WhoSaidIt Spanish public release | Accuracy (Male)87 | 1 | |
| Speaker-attribute classification | WhoSaidIt Italian public release | Attribute Accuracy: Male87.9 | 1 | |
| Speaker-attribute classification | WhoSaidIt Korean public release subset | Male Accuracy89.9 | 1 | |
| Speaker-attribute classification | WhoSaidIt Chinese public release | Accuracy (Male)94.9 | 1 | |
| Speaker-attribute classification | WhoSaidIt Pooled Languages public release | Accuracy (Male)92 | 1 |