Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Full benchmark suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Unique bug detectionFull benchmark suite Total
TP92
3
Unique bug detectionFull benchmark suite Large
TP18
3
Unique bug detectionFull benchmark suite Medium
True Positives (TP)8
3
Unique bug detectionFull benchmark suite Small
TP60
3
Showing 4 of 4 rows