When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

About

Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date information from the open Internet. While this integration enhances model capability, it also introduces a distinct safety threat surface: the retrieval and citation process has the potential risk of exposing users to harmful or low-credibility web content. Existing red-teaming methods are largely designed for standalone LLMs as they primarily focus on unsafe generation, ignoring risks emerging from the complex search workflow. To address this gap, we propose CREST-Search, a pioneering red-teaming framework for LLMs with web search. The cornerstone of CREST-Search is three novel attack strategies that generate seemingly benign search queries yet induce unsafe citations. It also employs an iterative in-context refinement mechanism to strengthen adversarial effectiveness under black-box constraints. In addition, we construct a search-specific harmful dataset, WebSearch-Harm, which enables fine-tuning a specialized red-teaming model to improve query quality. Our experiments demonstrate that CREST-Search can effectively bypass safety filters and systematically expose vulnerabilities in web search-based LLM systems, underscoring the necessity of the development of robust search models.

Haoran Ou, Kangjie Chen, Xingshuo Han, Gelei Deng, Jie Zhang, Han Qiu, Tianwei Zhang• 2025

Related benchmarks

Task	Dataset	Result	Rank
Red Teaming	Web-Augmented LLM Red-Teaming Evaluation Set	Detection Rate80.5		5

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord