Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Enterprise interface interaction on WorkArena L2 full benchmark
Loading...
69.4
Success Rate
GPT-5
9.184
24.817
40.45
56.083
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
GPT-5
Harness=GenericAgent,...
2026.04
69.4
Claude 4 Sonnet
Harness=GenericAgent,...
2026.04
40.4
GPT-oss-120B
Harness=GenericAgent,...
2026.04
11.5
Feedback
Search any
task
Search any
task