Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Illusionist's Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances

About

As Large Language Models (LLMs) continue to advance, they are increasingly relied upon as real-time sources of information by non-expert users. To ensure the factuality of the information they provide, much research has focused on mitigating hallucinations in LLM responses, but only in the context of formal user queries, rather than maliciously crafted ones. In this study, we introduce The Illusionist's Prompt, a novel hallucination attack that incorporates linguistic nuances into adversarial queries, challenging the factual accuracy of LLMs against five types of fact-enhancing strategies. Our attack automatically generates highly transferrable illusory prompts to induce internal factual errors, all while preserving user intent and semantics. Extensive experiments confirm the effectiveness of our attack in compromising black-box LLMs, including commercial APIs like GPT-4o and Gemini-2.0, even with various defensive mechanisms.

Yining Wang, Yuquan Wang, Xi Li, Mi Zhang, Geng Hong, Min Yang• 2025

Related benchmarks

TaskDatasetResultRank
Package Hallucination EvaluationLLM_AT
Hallucination ASR3.83
16
Package Hallucination EvaluationLLM_LY
Hallucination ASR38.24
16
Package Hallucination EvaluationSO_AT
Hallucination ASR43.48
16
Package Hallucination EvaluationSO_LY
Hallucination ASR48.69
16
Showing 4 of 4 rows

Other info

Follow for update