Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
OOD safety category inference (Stage 2) on ProGuard Image
Loading...
25.95
Mean Reward
ProGuard-3B
-1.0172
5.9839
12.985
19.9861
Dec 29, 2025
Mean Reward
Updated 3d ago
Evaluation Results
Method
Method
Links
Mean Reward
ProGuard-3B
2025.12
25.95
ProGuard-7B
2025.12
18.76
Gemini2.5-Flash
2025.12
4.57
GPT4o-mini
2025.12
0.02
Feedback
Search any
task
Search any
task