Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Perspective LLM Annotations for Valid Analyses in Subjective Tasks

About

Large language models are increasingly used to annotate texts, but their outputs reflect some human perspectives better than others. Existing methods for correcting LLM annotation error assume a single ground truth. However, this assumption fails in subjective tasks where disagreement across demographic groups is meaningful. Here we introduce Perspective-Driven Inference, a method that treats the distribution of annotations across groups as the quantity of interest, and estimates it using a small human annotation budget. We contribute an adaptive sampling strategy that concentrates human annotation effort on groups where LLM proxies are least accurate. We evaluate on politeness and offensiveness rating tasks, showing targeted improvements for harder-to-model demographic groups relative to uniform sampling baselines, while maintaining coverage.

Navya Mehrotra, Adam Visokay, Kristina Gligori\'c• 2026

Related benchmarks

TaskDatasetResultRank
Politeness RatingPOPQUORN Avg Age
Coverage95
10
Offensiveness RatingPOPQUORN Age 50+ 1.0 (test)
Coverage95
5
Politeness RatingPOPQUORN (Age 50+)
Coverage95
5
Offensiveness RatingPOPQUORN Avg Age 1.0 (test)
Coverage95
5
Offensiveness RatingPOPQUORN Age 18–34 1.0 (test)
Coverage (Cov.)95
5
Offensiveness RatingPOPQUORN Age 35–49 1.0 (test)
Coverage95
5
Politeness RatingPOPQUORN (Age 18–34)
Coverage90
5
Showing 7 of 7 rows

Other info

Follow for update