Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Insecure code

Benchmarks

Task NameDataset NameSOTA ResultTrend
Dishonesty EvaluationInsecure code (test)
Benchmark Dishonesty48.91
32
Data RankingInsecure code
AUROC0.71
28
Coding PerformanceInsecure-code 1000-prompt held-out
Task Success Rate93.9
7
Showing 3 of 3 rows