Labels have Human Values: Value Calibration of Subjective Tasks
About
Building NLP systems for subjective tasks requires one to ensure their alignment to contrasting human values. We propose the MultiCalibrated Subjective Task Learner framework (MC-STL), which clusters annotations into identifiable human value clusters by three approaches (similarity of annotator rationales, expert-value taxonomies or rater's sociocultural descriptors) and calibrates predictions for each value cluster by learning cluster-specific embeddings. We demonstrate MC-STL on several subjective learning settings, including ordinal, binary, and preference learning predictions, and evaluate it on multiple datasets covering toxic chatbot conversations, offensive social media posts, and human preference alignment. The results show that MC-STL consistently outperforms the baselines that ignore the latent value structure of the annotations, delivering gains in discrimination, value-specific calibration, and disagreement-aware metrics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Binary Classification | VP+Schwartz Binary (test) | Overall AUC0.8 | 3 | |
| Preference Learning | Anthropic HH-RLHF+VI Preference (test) | Overall Accuracy64 | 3 | |
| Rater Rationale Clustering | VP-Duty Binary (test) | Overall AUC0.76 | 3 | |
| Rater Rationale Clustering | VP-Right Ordinal (test) | Overall AUC75 | 3 | |
| Rater Rationale Clustering | VP-Duty Ordinal (test) | Overall AUC70 | 3 | |
| Rater Rationale Clustering | VP-Right Binary (test) | Overall AUC81 | 3 | |
| Rater Rationale Clustering | VP-Value Binary (test) | Overall AUC80 | 3 | |
| Rater Rationale Clustering | VP-Value Ordinal (test) | Overall AUC70 | 3 | |
| Sociocultural clustering | DICES-990 Ordinal | Overall AUC75 | 3 | |
| Sociocultural clustering | D3 Binary | AUC (Overall)0.63 | 3 |