Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Penetration Testing on Cybench
Loading...
55
Success Rate
Claude 4.5 Sonnet
21.2
29.975
38.75
47.525
Dec 10, 2025
Success Rate
Updated 3mo ago
Evaluation Results
Method
Method
Links
Success Rate
Claude 4.5 Sonnet
Framework=CyAgent
2025.12
55
ARTEMIS
Framework=ARTEMIS
2025.12
48.6
OpenAI GPT-5
Framework=A1 (Supervis...
2025.12
45.9
Claude 4.1 Opus
Framework=CyAgent
2025.12
38
Claude 4 Opus
Framework=CyAgent
2025.12
38
Claude 4 Sonnet
Framework=CyAgent
2025.12
35
OpenAI o3-mini
Framework=CyAgent
2025.12
22.5
Feedback
Search any
task
Search any
task