Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Explainability classification on WildGuardMix human-annotated (test)
Loading...
60.69
F1 Score
LEG base
54.6372
56.2086
57.78
59.3514
Jan 24, 2026
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
LEG base
size=base
2026.01
60.69
LEG large
size=large
2026.01
58.39
GPT-4o-mini
zero-shot=true
2026.01
54.87
Feedback
Search any
task
Search any
task