Labels have Human Values: Value Calibration of Subjective Tasks

About

Building NLP systems for subjective tasks requires one to ensure their alignment to contrasting human values. We propose the MultiCalibrated Subjective Task Learner framework (MC-STL), which clusters annotations into identifiable human value clusters by three approaches (similarity of annotator rationales, expert-value taxonomies or rater's sociocultural descriptors) and calibrates predictions for each value cluster by learning cluster-specific embeddings. We demonstrate MC-STL on several subjective learning settings, including ordinal, binary, and preference learning predictions, and evaluate it on multiple datasets covering toxic chatbot conversations, offensive social media posts, and human preference alignment. The results show that MC-STL consistently outperforms the baselines that ignore the latent value structure of the annotations, delivering gains in discrimination, value-specific calibration, and disagreement-aware metrics.

Mohammed Fayiz Parappan, Ricardo Henao• 2026

Related benchmarks

Task	Dataset	Result
Binary Classification	VP+Schwartz Binary (test)	Overall AUC0.8	3
Preference Learning	Anthropic HH-RLHF+VI Preference (test)	Overall Accuracy64	3
Rater Rationale Clustering	VP-Duty Binary (test)	Overall AUC0.76	3
Rater Rationale Clustering	VP-Right Ordinal (test)	Overall AUC75	3
Rater Rationale Clustering	VP-Duty Ordinal (test)	Overall AUC70	3
Rater Rationale Clustering	VP-Right Binary (test)	Overall AUC81	3
Rater Rationale Clustering	VP-Value Binary (test)	Overall AUC80	3
Rater Rationale Clustering	VP-Value Ordinal (test)	Overall AUC70	3
Sociocultural clustering	DICES-990 Ordinal	Overall AUC75	3
Sociocultural clustering	D3 Binary	AUC (Overall)0.63	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord