Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence

About

We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from existing corpora. Theoretically, we derive design guidance for how the discretization bins should scale with data and provide a principled justification for our test statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines, demonstrating strong robustness across domains and generators; our experiments on hyperparameter sensitivity exhibit trends that our theoretical results help to explain.

Shuangyi Chen, Ashish Khisti• 2025

Related benchmarks

TaskDatasetResultRank
Detection of LLM generated textWritingPrompts GPT-J-6B
AUROC97.6
15
Detection of LLM generated textXSum GPT-J-6B
AUROC88.35
15
LLM-generated text detectionXsum, WritingPrompts, and SQuAD Gemini-1.5-Flash (test)
AUROC75.14
15
LLM-generated text detectionXsum, WritingPrompts, and SQuAD generated by GPT-4.1-mini (test)
AUROC80.25
15
LLM-generated text detectionXsum, WritingPrompts, and SQuAD generated by GPT-5-Chat (test)
AUROC81.33
15
LLM-generated text detectionXsum, WritingPrompts, and SQuAD Aggregated (test)
GPT2-XL98.35
15
Machine-generated text detectionDetectRL-arXiv cross-source corruption (test)
AUROC93.86
9
LLM-generated text detectionXSum GPT-5-Chat
TPR @ FPR=1%31.33
3
LLM-generated text detectionWritingPrompts GPT-4.1-mini
TPR @ FPR=1%31.33
3
LLM-generated text detectionWritingPrompts Llama3-8B
TPR @ FPR=1%100
3
Showing 10 of 25 rows

Other info

Follow for update