Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence
About
We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from existing corpora. Theoretically, we derive design guidance for how the discretization bins should scale with data and provide a principled justification for our test statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines, demonstrating strong robustness across domains and generators; our experiments on hyperparameter sensitivity exhibit trends that our theoretical results help to explain.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Detection of LLM generated text | WritingPrompts GPT-J-6B | AUROC97.6 | 15 | |
| Detection of LLM generated text | XSum GPT-J-6B | AUROC88.35 | 15 | |
| LLM-generated text detection | Xsum, WritingPrompts, and SQuAD Gemini-1.5-Flash (test) | AUROC75.14 | 15 | |
| LLM-generated text detection | Xsum, WritingPrompts, and SQuAD generated by GPT-4.1-mini (test) | AUROC80.25 | 15 | |
| LLM-generated text detection | Xsum, WritingPrompts, and SQuAD generated by GPT-5-Chat (test) | AUROC81.33 | 15 | |
| LLM-generated text detection | Xsum, WritingPrompts, and SQuAD Aggregated (test) | GPT2-XL98.35 | 15 | |
| Machine-generated text detection | DetectRL-arXiv cross-source corruption (test) | AUROC93.86 | 9 | |
| LLM-generated text detection | XSum GPT-5-Chat | TPR @ FPR=1%31.33 | 3 | |
| LLM-generated text detection | WritingPrompts GPT-4.1-mini | TPR @ FPR=1%31.33 | 3 | |
| LLM-generated text detection | WritingPrompts Llama3-8B | TPR @ FPR=1%100 | 3 |