Share your thoughts, 1 month free Claude Pro on usSee more

Enterprise interface interaction on WorkArena L2 full benchmark

69.4Success Rate

GPT-5

Updated 9d ago

Evaluation Results

Method	Links
GPT-5 2026.04		69.4
Claude 4 Sonnet 2026.04		40.4
GPT-oss-120B 2026.04		11.5