Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
OOD safety category inference (Stage 2) on LlavaGuard
Loading...
13.28
Reward Mean
Gemini2.5-Flash
0.2072
3.6011
6.995
10.3889
Dec 29, 2025
Reward Mean
Updated 3d ago
Evaluation Results
Method
Method
Links
Reward Mean
Gemini2.5-Flash
2025.12
13.28
ProGuard-7B
2025.12
5.98
ProGuard-3B
2025.12
5.31
GPT4o-mini
2025.12
0.71
Feedback
Search any
task
Search any
task