Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PBEBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
String transformationPBEBench Lite
Accuracy93.9
15
Long-horizon cascade synthesisPBEBench Hard
Accuracy85.8
10
Showing 2 of 2 rows