Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmfulness Detection on Aegis 2.0
Loading...
83.4
Macro F1
SIREN
72.168
75.084
78
80.916
Apr 20, 2026
Macro F1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Macro F1
SIREN
Backbone=Qwen3-4B
2026.04
83.4
SIREN
Backbone=Llama3.1-8B
2026.04
82.9
SIREN
Backbone=Llama3.2-1B
2026.04
82.7
Qwen3Guard
Backbone=Qwen3-4B
2026.04
82.5
SIREN
Backbone=Qwen3-0.6B
2026.04
82.1
Qwen3Guard
Backbone=Qwen3-0.6B
2026.04
82
LlamaGuard3
Backbone=Llama3.1-8B
2026.04
78
LlamaGuard3
Backbone=Llama3.2-1B
2026.04
72.6
Feedback
Search any
task
Search any
task