Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Security Analysis on CyberGym
Loading...
60.2
Resolved Percentage
SageAgent
27.648
36.099
44.55
53.001
Feb 18, 2026
Resolved Percentage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Resolved Percentage
SageAgent
Model=GPT-5 (medium),...
2026.02
60.2
Anthropic Agent
Model=Claude Opus 4.5,...
2026.02
50.6
OpenHands
Model=GPT-5 (high), AD...
2026.02
39.4
Anthropic Agent
Model=Claude Sonnet 4....
2026.02
28.9
Feedback
Search any
task
Search any
task