Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AB-OS

Benchmarks

Task NameDataset NameSOTA ResultTrend
STRINGAB-OS
LLM Judge Accuracy61.4
9
Agentic Task SolvingAB-OS
Pass@379.7
9
String ExtractionAB-OS
Exact Match61.4
9
Showing 3 of 3 rows