Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PinchBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent TaskPinchBench (PB)
Accuracy100
21
Agent task completionPinchBench
Pass@188.7
17
Real-World AgentPinchBench
Average Score82.3
15
Task Performance EvaluationPinchBench v2.0.0
Best Score89.31
6
Task Performance EvaluationPinchBench v1.2.0
Best Score90.1
6
Agent & OpenClawPinchBench
Accuracy83.7
5
Showing 6 of 6 rows