Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Single-Agent

Benchmarks

Task NameDataset NameSOTA ResultTrend
Targeted AnswerSingle-Agent Evaluation Set
R@5100
12
Phishing WormSingle-Agent Evaluation Set
R@5100
6
Showing 2 of 2 rows