Segmenting Human-LLM Co-authored Text via Change Point Detection

About

The rise of large language models (LLMs) has created an urgent need to distinguish between human-written and LLM-generated text to ensure authenticity and societal trust. Existing detectors typically provide a binary classification for an entire passage; however, this is insufficient for human--LLM co-authored text, where the objective is to localize specific segments authored by humans or LLMs. To bridge this gap, we propose algorithms to segment text into human- and LLM-authored pieces. Our key observation is that such a segmentation task is conceptually similar to classical change point detection in time-series analysis. Leveraging this analogy, we adapt change point detection to LLM-generated text detection, develop a weighted algorithm and a generalized algorithm to accommodate heterogeneous detection score variability, and establish the minimax optimality of our procedure. Empirically, we demonstrate the strong performance of our approach against a wide range of existing baselines. The python implementation of our proposal is available at https://github.com/Mamba413/DetectLLMSegmentation.

Mengchu Li, Jin Zhu, Jinglai Li, Chengchun Shi• 2026

Related benchmarks

Task	Dataset	Result
Single change-point detection	WikiQA	WD0.227	14
Single change-point detection	News	WD0.12	12
Single change-point detection	Story	WD0.207	12
Change Point Detection	CoAuthor	WD0.36	7
Multiple change-point detection	Story dataset Claude 4.5 K=1	WD0.26	6
Multiple change-point detection	Story dataset Claude 4.5 K=2	Word Distance (WD)0.29	6
Multiple change-point detection	Story dataset Claude 4.5 K=3	WD Score0.31	6
Multiple change-point detection	Story dataset Claude 4.5 K=5	WD0.32	6
Multiple change-point detection	Story dataset K=1 GPT-5-mini	WD Score0.39	6
Multiple change-point detection	Story dataset GPT-5-mini K=2	WD0.41	6

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord