Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

About

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies. Our code is available at https://github.com/CLCS-SUSTech/FourierGPT

Yang Xu, Yu Wang, Hao An, Zhichen Liu, Yongyuan Li• 2024

Related benchmarks

Task	Dataset	Result
Machine-generated text detection	TruthfulQA	TPR@FPR-1% (ChatGLM)98.15	54
Machine-generated text detection	Essay (test)	GPT4All Score98.13	39
AI-generated text detection	Essay	AUROC (GPT4All)99.68	35
LLM-generated text detection	EvoBench	LLaMA3 Score63.99	26
Machine-generated text detection	MAGE	AUROC (Avg)60.34	24
LLM-generated text detection	Xsum, WritingPrompts, and SQuAD generated by GPT-5-Chat (test)	AUROC64.82	15
LLM-generated text detection	Xsum, WritingPrompts, and SQuAD Gemini-1.5-Flash (test)	AUROC61.25	15
LLM-generated text detection	Xsum, WritingPrompts, and SQuAD generated by GPT-4.1-mini (test)	AUROC63.05	15
LLM-generated text detection	Xsum, WritingPrompts, and SQuAD Aggregated (test)	GPT2-XL54.72	15
Machine-generated text detection	DetectRL Training Text: ChatGPT	--	12

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord