Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AB-DB

Benchmarks

Task NameDataset NameSOTA ResultTrend
FILESAB-DB
Diff Recall48.8
9
STRINGAB-DB
LLM Judge Accuracy28.7
9
Agentic Task SolvingAB-DB
Pass@351.1
9
File EditingAB-DB
Exact Match46.6
9
String ExtractionAB-DB
Exact Match26.2
9
Showing 5 of 5 rows