Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

About

The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TRAPDOC serves as a strong foundation for promoting more responsible and thoughtful engagement with language models. Our code is available at https://github.com/jindong22/TrapDoc.

Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han• 2025

Related benchmarks

TaskDatasetResultRank
Watermarking PreventionTen-exam benchmark 1.0 (test)
Prevention ASR0.405
20
Watermark DetectionTen-exam benchmark 1.0 (test)
Detection Score87.9
20
DetectionLongForm
Score (gpt-5.1)100
5
DetectionT/F
GPT-5.1 Score (T/F)82.1
5
PreventionMCQ
gpt-5.1 Score89.9
5
PreventionT/F
gpt-5.1 Score83.3
5
PreventionLongForm
Score (gpt-5.1)83
5
DetectionMCQ
Detection Score98.3
5
Showing 8 of 8 rows

Other info

Follow for update