Share your thoughts, 1 month free Claude Pro on usSee more

SOTA Real-World Agent benchmarks and papers with code | Wizwand

Share your thoughts, 1 month free Claude Pro on usSee more

Real-World Agent

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
Claw-Eval		Average Score80.6		22	1mo ago
PinchBench		Average Score82.3		15	1mo ago

Showing 2 of 2 rows