Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OpenClaw

Benchmarks

Task NameDataset NameSOTA ResultTrend
Contextual vulnerability assessmentOpenClaw
Risk 1 AGS Score93
9
Long-term state poisoning evaluationOpenClaw Injection Tool
Harm Score (HS)3.66
8
Prompt InjectionOpenClaw (140 adversarial instances)
Defense Success Rate90
7
Malicious command interceptionOpenClaw 1.0 (test)
Defense Rate83
6
Long-term state poisoning evaluationOpenClaw ZH
Harm Score (HS)3.82
4
Long-term state poisoning evaluationOpenClaw EN
Harm Score (HS)3.76
4
Long-term state poisoning evaluationOpenClaw Average across conversation variants
Harm Score (HS)4.35
4
Long-term state poisoning evaluationOpenClaw Web Content
Harm Score (HS)4.03
4
Long-term state poisoning evaluationOpenClaw Log Replay
Harm Score (HS)4.34
4
Long-term state poisoning evaluationOpenClaw Routine
Harm Score (HS)3.9
4
Threat DetectionOpenClaw 140 adversarial instances
Defense Success Rate85
4
Credential LeakageOpenClaw 140 adversarial instances
Defense Success Rate85
4
Remote Code Execution Attack Success RateOpenClaw
C-F Score65
3
Malicious SkillOpenClaw (140 adversarial instances)
Defense Success Rate90
3
Configuration ModificationOpenClaw 140 adversarial instances
Defense Success Rate90
3
End-to-End Safety BlockingOpenClaw
Parent Runs539
2
Dangerous CommandOpenClaw 140 adversarial instances
Defense Success Rate90
2
Autonomous Research Agent Capability AssessmentOpenCLAW-P2P
Metric-
0
Showing 18 of 18 rows