Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Task Benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic Task PerformanceAgent Task Benchmark 240 documents 1.0 (Evaluation set)
Information Lookup Success Rate92.3
4
Showing 1 of 1 rows