"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise

About

Measuring the diversity of creative outputs is central to evaluating post-training mode collapse, comparing decoding strategies, and quantifying creative behavior in both AI and human writing. We propose a new approach to measuring diversity using in-context learning, of which the ``Decan'' metric, $D_{Ca_n} = C \times a_n$, is the working instance we evaluate: a per-byte score read off the per-token log-probabilities of a base model $\theta$ in a \emph{single forward pass} per permutation, with no embedding model, no reference corpus, and no human labels. This approach is grounded in information theory, makes use of language model in-context learning to detect a wide range of similarities between any number of inputs, and obviates the need to train a special-purpose model. The same pipeline scores AI samples and human-written response sets, with diversity treated as a property of (responses, prompt, scoring model). On Tevet and Berant's human-grounded McDiv benchmark, $D_{Ca_n}$ reaches OCA 0.846 on the McDiv prompt\_gen set where it performs best, behind the strongest neural baseline reported in Tevet and Berant (SentBERT, 0.897). On the OLMo-2-7B post-training pipeline, $D_{Ca_n}$ drops monotonically across the base $\to$ SFT $\to$ DPO $\to$ RLVR stages, detecting the type of diversity loss that creative-writing applications care about.

Matthew Khoriaty, David Williams-King, Shi Feng• 2026

Related benchmarks

Task	Dataset	Result
prompt_gen	ConTest 200 with_hds	Spearman Rho0.686	12
Prompt Generation	DecTest prompt_gen 1000 samples no_hds	Spearman Rho0.932	7
Response Generation	DecTest resp_gen no_hds (1000 samples)	Spearman ρ0.924	7
Story Generation	DecTest story_gen no_hds (1000 samples)	Spearman ρ0.779	7
prompt_gen	McDiv nuggets ~1K, no_hds	Spearman Rho0.636	6
prompt_gen	McDiv full no_hds ~2K	Spearman Rho0.729	6
resp_gen	ConTest 200 with_hds	Spearman Correlation0.391	6
resp_gen	McDiv nuggets ~1K no hds	Spearman rho0.345	6
story_gen	McDiv_nuggets ~1K no_hds	Spearman rho0.317	6
resp_gen	McDiv full no_hds ~2K	Spearman rho0.5	6

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord