| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HumanEval | Perplexity Filter | False Positive Rate0 | 32 | 2mo ago | |
| OR-Bench | Perplexity Filter | False Positive Rate0 | 26 | 2mo ago | |
| ULSPB 350 interaction runs | StateGuard (Base-Ensemble) | HS Rate0.02 | 24 | 23d ago | |
| AlpacaEval | Llama Guard | False Positive Rate0 | 24 | 2mo ago | |
| GSM8K | Perplexity Filter | False Positive Rate0 | 24 | 2mo ago |