SentGuard: Sentence-Level Streaming Guardrails for Large Language Models

About

Large language models increasingly stream long, reasoning-intensive responses in real time, making when to moderate as critical as whether to moderate. Existing guardrails fall into two unsatisfactory extremes: response-level methods delay intervention until the full output is generated, whereas token-level methods act on incomplete semantics, often producing unstable decisions and excessive guard invocations. To address this challenge, we propose SentGuard, a sentence-level streaming guardrail that operates in parallel with generation. A lightweight waiting buffer groups streamed tokens into sentence chunks and releases only verified chunks to the user, introducing a small offset that enables SentGuard to assess the current prefix while the target LLM decodes subsequent content. To support this, we construct StreamSafe, a benchmark with structured per-sentence annotations across 8 harm categories, capturing the evolution of safety risks across both reasoning and response segments. We further train SentGuard with a coarse-to-fine objective to detect unsafe intent as soon as it emerges at sentence boundaries. Experiments on 5 safety benchmarks show that SentGuard outperforms existing baselines, detecting 90.5% of unsafe cases within two sentences while maintaining a low streaming false-positive rate of 7.41%.

Jiaqi Yu, Xin Wang, Yixu Wang, Jie Li, Yan Teng, Xingjun Ma, Yingchun Wang• 2026

Related benchmarks

Task	Dataset	Result
Safety Classification	WildGuard (test)	F1 Score80	35
Streaming Safety Detection	Safe RLHF	Det@196.43	8
Streaming Safety Detection	XSTest	Det@189.74	8
Streaming Safety Detection	WildGuard (test)	Det@183.45	8
Streaming Safety Detection	StreamSafe	Det@154.55	8
Streaming Safety Detection	Beavertails	Det@176.34	8
Full-response Safety Guardrail Classification	StreamSafe internal (test)	F1 Score98.7	7
Full-response Safety Guardrail Classification	Safe-RLHF (test)	F1 Score92.5	7
Full-response Safety Guardrail Classification	XSTest (test)	F1 Score91.2	7
Full-response Safety Guardrail Classification	BeaverTails (test)	F1 Score81.2	7

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord