TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

About

The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TRAPDOC serves as a strong foundation for promoting more responsible and thoughtful engagement with language models. Our code is available at https://github.com/jindong22/TrapDoc.

Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han• 2025

Related benchmarks

Task	Dataset	Result
Watermarking Prevention	Ten-exam benchmark 1.0 (test)	Prevention ASR0.405	20
Watermark Detection	Ten-exam benchmark 1.0 (test)	Detection Score87.9	20
Detection	LongForm	Score (gpt-5.1)100	5
Detection	T/F	GPT-5.1 Score (T/F)82.1	5
Prevention	MCQ	gpt-5.1 Score89.9	5
Prevention	T/F	gpt-5.1 Score83.3	5
Prevention	LongForm	Score (gpt-5.1)83	5
Detection	MCQ	Detection Score98.3	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord