Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robotic Capture-the-Flag on CTF-0 1.0 (test)
Loading...
5
Success Count
Environment-Grounded Multi-Agent Workflow
3.96
4.23
4.5
4.77
Mar 25, 2026
Success Count
Failure Count
Updated 24d ago
Evaluation Results
Method
Method
Links
Success Count
Failure Count
Environment-Grounded Multi-Agent Workflow
Model=llama-3.3-70b-in...
2026.03
5
0
Environment-Grounded Multi-Agent Workflow
Model=deepseek-v3.2, A...
2026.03
5
0
Environment-Grounded Multi-Agent Workflow
Model=gemma-3-27b-it,...
2026.03
5
0
Environment-Grounded Multi-Agent Workflow
Model=hermes-2-pro-lla...
2026.03
5
0
HackSynth
Model=llama-3.3-70b-in...
2026.03
5
0
HackSynth
Model=deepseek-v3.2, A...
2026.03
4
1
HackSynth
Model=gemma-3-27b-it,...
2026.03
4
1
HackSynth
Model=hermes-2-pro-lla...
2026.03
4
1
Feedback
Search any
task
Search any
task