Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Capture The Flag on 30 CTF Challenges
Loading...
0.6333
Success Rate (30 Challenges)
Executor
0.286667
0.376667
0.466667
0.556667
Apr 29, 2026
Success Rate (30 Challenges)
Average Steps
Average Cost
Average Duration (s)
Updated 12d ago
Evaluation Results
Method
Method
Links
Success Rate (30 Challenges)
Average Steps
Average Cost
Average Duration (s)
Executor
Model=GPT-5
2026.04
0.6333
31.56
0.9
336.74
Executor + Evaluator
Model=GPT-5
2026.04
0.6333
28.76
0.64
799.3
Planner + Executor + Evaluator
Model=GPT-5
2026.04
0.6333
24.09
0.59
925.8
claude-code
Model=Opus-4-5
2026.04
0.6333
45.52
1.26
284.72
Executor
Model=GPT-4.1
2026.04
0.3
36.74
1.43
162.13
Feedback
Search any
task
Search any
task