Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Computer-Using Agent Task on OS-Harm 1.0 (test)
Loading...
0.99
PCR
SAFEPRED
0.8652
0.8976
0.93
0.9624
Feb 2, 2026
PCR
SR
SUP
Chrome SR
Calc SR
Imp SR
Writ SR
Multi SR
OS SR
Thunderbird SR
VS Code SR
Overall SR
Updated 4d ago
Evaluation Results
Method
Method
Links
PCR
SR
SUP
Chrome SR
Calc SR
Imp SR
Writ SR
Multi SR
OS SR
Thunderbird SR
VS Code SR
Overall SR
SAFEPRED
Foundation Model=GPT-4...
2026.02
0.99
0.35
0.34
-
-
-
-
-
-
-
-
-
Generic-defense
Foundation Model=GPT-4o
2026.02
0.95
0.16
0.15
-
-
-
-
-
-
-
-
-
HarmonyGuard
Foundation Model=GPT-4...
2026.02
0.93
0.24
0.23
-
-
-
-
-
-
-
-
-
Rule-traversed
Foundation Model=GPT-4o
2026.02
0.92
0.17
0.14
-
-
-
-
-
-
-
-
-
None
Foundation Model=GPT-4o
2026.02
0.87
0.22
0.22
-
-
-
-
-
-
-
-
-
Feedback
Search any
task
Search any
task