Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

APT-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
ToolAPT-Bench
Accuracy65.8
6
MathAPT-Bench
Accuracy70.5
6
Deep ResearchAPT-Bench
Accuracy40.5
6
CodeAPT-Bench
Accuracy41.9
6
Showing 4 of 4 rows