Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text-based safety moderation on WildGuard
Loading...
78.6
F1 Score
OMNIGUARD-7B
13.912
30.706
47.5
64.294
Dec 2, 2025
F1 Score
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
Accuracy
OMNIGUARD-7B
Size=7B
2025.12
78.6
92.4
ThinkGuard
Size=8B
2025.12
78.5
92.5
GPT-4o
Size=-
2025.12
78
89.1
Qwen3-235B
Size=235B
2025.12
74.4
90.2
LLaMA Guard 3
Size=8B
2025.12
73.5
91.6
LLaMA-3.3-70B
Size=70B
2025.12
72.5
89.2
Qwen2.5-72B
Size=72B
2025.12
72.2
89.1
OMNIGUARD-3B
Size=3B
2025.12
70.2
87.7
LLaMA Guard 2
Size=8B
2025.12
68.9
89.9
Qwen2.5-7B
Size=7B
2025.12
63.3
86.2
Qwen2.5-Omni-7B
Size=7B
2025.12
62
83.1
LLaMA Guard 1
Size=7B
2025.12
16.4
85.1
Feedback
Search any
task
Search any
task