Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GitLab

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction Injection Attack on Web Browser AgentGitLab Short
UUA100
16
Instruction Injection Attack on Web Browser AgentGitLab Medium
UUA100
16
Instruction Injection Attack on Web Browser AgentGitLab Long
UUA100
16
Autonomous Task CompletionGitLab
Success Rate (SR)66.2
6
Web Task CompletionGitlab
Accuracy87
5
Web Agent AutomationGitlab
End-to-end Latency (seconds)9.9
5
Showing 6 of 6 rows