| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Agent Task | PinchBench (PB) | Accuracy100 | 21 | |
| Agent task completion | PinchBench | Pass@188.7 | 17 | |
| Real-World Agent | PinchBench | Average Score82.3 | 15 | |
| Task Performance Evaluation | PinchBench v2.0.0 | Best Score89.31 | 6 | |
| Task Performance Evaluation | PinchBench v1.2.0 | Best Score90.1 | 6 | |
| Agent & OpenClaw | PinchBench | Accuracy83.7 | 5 |