Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Violation Detection on DynaBench (test)
Loading...
86
F1 Score
Activation-Space Whitening
18.296
35.873
53.45
71.027
Dec 3, 2025
F1 Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
F1 Score
Activation-Space Whitening
Base Model=Qwen2.5-7B-...
2025.12
86
Activation-Space Whitening
Base Model=Qwen3-8B, A...
2025.12
78.4
Activation-Space Whitening
Base Model=Llama-3.1-8...
2025.12
75.6
Activation-Space Whitening
Base Model=Gemma-2-9B-...
2025.12
75.2
DynaGuard-8B
Approach=Fine-tuned
2025.12
73.1
DynaGuard-8B (non-CoT)
Approach=Fine-tuned
2025.12
72.5
DynaGuard-4B
Approach=Fine-tuned
2025.12
72
GPT-4o-mini
Approach=LLM-as-a-judge
2025.12
70.1
Activation-Space Whitening
Base Model=Mistral-7B-...
2025.12
66.8
DynaGuard-1.7B
Approach=Fine-tuned
2025.12
65.2
Qwen3-8B
Approach=LLM-as-a-judge
2025.12
60.7
LlamaGuard-3
Approach=Fine-tuned
2025.12
20.9
Feedback
Search any
task
Search any
task