Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
OOD safety category inference (Stage 2) on ProGuard Text-Image
Loading...
26.86
Mean Reward
ProGuard-7B
-1.0744
6.1778
13.43
20.6822
Dec 29, 2025
Mean Reward
Updated 3d ago
Evaluation Results
Method
Method
Links
Mean Reward
ProGuard-7B
2025.12
26.86
ProGuard-3B
2025.12
20.89
Gemini2.5-Flash
2025.12
4.43
GPT4o-mini
2025.12
0
Feedback
Search any
task
Search any
task