Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Concept Training for Human-Aligned Language Models

About

The next-token prediction (NTP) objective trains language models to predict a single continuation token at each step. In natural language, however, a prefix can be continued in many valid ways, and even similar meanings may differ in surface form. For example, the sentence ``this website is safe to \underline{browse}'' could plausibly continue with words such as browse, search, visit, surf, or navigate. While standard NTP training treats these alternatives as mutually exclusive targets, we explore a framework that instead predicts concepts, approximated as sets of semantically related tokens. We show that models trained with concept supervision exhibit stronger alignment with human semantic similarity judgments on multiple lexical benchmarks. These gains are accompanied by lower perplexity on semantically meaningful words (definition in Section 3.1), and a modest increase in global token-level perplexity, reflecting a tradeoff between standard NTP optimization and concept-level supervision. Our results suggest that concept-level objectives can improve semantic alignment while maintaining competitive language modeling performance.

Christine Zhang, Dan Jurafsky, Chen Shani• 2026

Related benchmarks

TaskDatasetResultRank
Semantic Textual SimilaritySTS-B
Spearman's Rho (x100)65.05
136
Word SimilarityWordSim-353
Spearman Rho0.3071
114
Word SimilarityMEN
Spearman Rho0.4362
68
Semantic SimilaritySimLex
Spearman Correlation0.1844
60
ClusteringOpenWebText
Clustering Score0.6222
30
ClusteringC4
Clustering Score63.95
30
Next-token predictionOpenWebText (held-out)
ID PPL18.53
30
Next-token predictionC4
OOD Perplexity21.1
30
Next-token predictionC4 (held-out)
Perplexity (PPL)21.5
30
Next-token predictionOpenWebText
PPL18.68
30
Showing 10 of 10 rows

Other info

Follow for update