Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robotic Capture-the-Flag on CTF-2
Loading...
5
Success Count
Environment-Grounded Multi-Agent Workflow
-0.2
1.15
2.5
3.85
Mar 25, 2026
Success Count
Failure Count
Updated 24d ago
Evaluation Results
Method
Method
Links
Success Count
Failure Count
Environment-Grounded Multi-Agent Workflow
Model=llama-3.3-70b-in...
2026.03
5
0
Environment-Grounded Multi-Agent Workflow
Model=deepseek-v3.2, A...
2026.03
1
4
Environment-Grounded Multi-Agent Workflow
Model=gemma-3-27b-it,...
2026.03
0
5
Environment-Grounded Multi-Agent Workflow
Model=hermes-2-pro-lla...
2026.03
0
5
HackSynth
Model=llama-3.3-70b-in...
2026.03
0
5
HackSynth
Model=deepseek-v3.2, A...
2026.03
0
5
HackSynth
Model=gemma-3-27b-it,...
2026.03
0
5
HackSynth
Model=hermes-2-pro-lla...
2026.03
0
5
Feedback
Search any
task
Search any
task