Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Vulnerability Discovery on TerminalBench 2 snapshot 2026-04-17
Loading...
84.3
Score (%)
AgentFlow
59.964
66.282
72.6
78.918
Apr 22, 2026
Score (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score (%)
AgentFlow
Backbone=Claude Opus 4...
2026.04
84.3
ForgeCode
Backbone=Claude Opus 4...
2026.04
81.4
Capy
Backbone=Claude Opus 4...
2026.04
77.7
Terminus-KIRA
Backbone=Claude Opus 4...
2026.04
77.3
Meta-Harness
Backbone=Claude Opus 4...
2026.04
76.4
TongAgents
Backbone=Claude Opus 4...
2026.04
74.6
Droid
Backbone=Claude Opus 4...
2026.04
72.4
Mux
Backbone=Claude Opus 4...
2026.04
69
Crux
Backbone=Claude Opus 4...
2026.04
66.9
Terminus 2
Backbone=Claude Opus 4...
2026.04
65.6
Claude Code
Backbone=Claude Opus 4...
2026.04
60.9
Feedback
Search any
task
Search any
task