Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
OOD safety category inference (Stage 2) on ProGuard Text
Loading...
32.94
Reward-mean
ProGuard-3B
-1.1928
7.6686
16.53
25.3914
Dec 29, 2025
Reward-mean
Updated 3d ago
Evaluation Results
Method
Method
Links
Reward-mean
ProGuard-3B
2025.12
32.94
ProGuard-7B
2025.12
32.59
Gemini2.5-Flash
2025.12
11.26
GPT4o-mini
2025.12
0.12
Feedback
Search any
task
Search any
task