Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise

About

Measuring the diversity of creative outputs is central to evaluating post-training mode collapse, comparing decoding strategies, and quantifying creative behavior in both AI and human writing. We propose a new approach to measuring diversity using in-context learning, of which the ``Decan'' metric, $D_{Ca_n} = C \times a_n$, is the working instance we evaluate: a per-byte score read off the per-token log-probabilities of a base model $\theta$ in a \emph{single forward pass} per permutation, with no embedding model, no reference corpus, and no human labels. This approach is grounded in information theory, makes use of language model in-context learning to detect a wide range of similarities between any number of inputs, and obviates the need to train a special-purpose model. The same pipeline scores AI samples and human-written response sets, with diversity treated as a property of (responses, prompt, scoring model). On Tevet and Berant's human-grounded McDiv benchmark, $D_{Ca_n}$ reaches OCA 0.846 on the McDiv prompt\_gen set where it performs best, behind the strongest neural baseline reported in Tevet and Berant (SentBERT, 0.897). On the OLMo-2-7B post-training pipeline, $D_{Ca_n}$ drops monotonically across the base $\to$ SFT $\to$ DPO $\to$ RLVR stages, detecting the type of diversity loss that creative-writing applications care about.

Matthew Khoriaty, David Williams-King, Shi Feng• 2026

Related benchmarks

TaskDatasetResultRank
prompt_genConTest 200 with_hds
Spearman Rho0.686
12
Prompt GenerationDecTest prompt_gen 1000 samples no_hds
Spearman Rho0.932
7
Response GenerationDecTest resp_gen no_hds (1000 samples)
Spearman ρ0.924
7
Story GenerationDecTest story_gen no_hds (1000 samples)
Spearman ρ0.779
7
prompt_genMcDiv nuggets ~1K, no_hds
Spearman Rho0.636
6
prompt_genMcDiv full no_hds ~2K
Spearman Rho0.729
6
resp_genConTest 200 with_hds
Spearman Correlation0.391
6
resp_genMcDiv nuggets ~1K no hds
Spearman rho0.345
6
story_genMcDiv_nuggets ~1K no_hds
Spearman rho0.317
6
resp_genMcDiv full no_hds ~2K
Spearman rho0.5
6
Showing 10 of 13 rows

Other info

Follow for update