GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy
About
Production LLM systems require both safety moderation and PII detection under strict latency and cost constraints. This creates a trade-off: autoregressive moderators are accurate but expensive, while lightweight encoders are faster but less capable. We present GLiNER Guard (GLiGuard), a unified encoder that performs safety classification and PII detection in a single forward pass, simplifying safety pipelines. We introduce three variants: compact uni- and bi-encoders (145-147M) for high-throughput serving, and GLiGuard Omni (209M) for stronger moderation quality. Under dynamic batching on a single A100, the compact model reaches 193 requests/sec with P99 latency below 1s, achieving 1.6x higher throughput than GLiNER2. Omni remains competitive with much larger moderators on public safety benchmarks. We also release PII-Bench, a span-level benchmark for evaluating PII detection in end-to-end pipelines. Overall, encoder-based guardrails offer a practical low-cost alternative for always-on moderation. Models and benchmarks are released on HuggingFace.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Safety Moderation | StrongREJECT | F1 Score99.7 | 15 | |
| Safety Moderation | PolyGuard | Prompt F171.7 | 15 | |
| Safety Moderation | Aegis 2.0 | Prompt F180.2 | 15 | |
| Safety Moderation | Aegis, StrongReject, PolyGuard Aggregate 2.0 | Average F1 Score76.9 | 15 | |
| Safety Classification | AegisSafetyTest V2 | -- | 14 | |
| Binary Safety Classification | ToxicChat jailbreaking | Macro F170.54 | 11 | |
| Binary Safety Classification | oai_safety OpenAI moderation | Macro F167.85 | 11 | |
| Inference Efficiency | 1024-token sequences (inference summary) | Throughput (Samples/s)34.49 | 11 | |
| Binary Safety Classification | wildguard prompt safety | Macro F172.62 | 11 | |
| PII detection | PII-Bench | Name F183.1 | 10 |