Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ShellOps

Benchmarks

Task NameDataset NameSOTA ResultTrend
Macro-average Exact MatchShellOps-Pro
Macro-average Exact Match Accuracy53.3
36
HYBRIDShellOps
Combined Score52
9
FILESShellOps
Diff Recall58.3
9
STRINGShellOps
LLM Judge Accuracy49.1
9
Agentic Task SolvingShellOps
Pass@30.462
9
Hybrid OperationsShellOps
Exact Match24.6
9
File EditingShellOps
Exact Match26.5
9
String ExtractionShellOps
Exact Match48.5
9
Showing 8 of 8 rows