Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ScienceAgentBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific Agent Task CompletionScienceAgentBench
Success Rate (SR)43.1
10
Scientific Code GenerationScienceAgentBench
SR25.5
10
Scientific Code GenerationScienceAgentBench (test)
SR27.5
8
Scientific Agent TaskScienceAgentBench (test)
Success Rate (SR)18.6
6
Showing 4 of 4 rows