Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Coding Performance on Insecure-code 1000-prompt held-out

93.9Task Success Rate

Naive (corruption baseline)

78.19682.27386.3590.427May 26, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.05
93.9
2026.05
93.6
2026.05
92.3
2026.05
87.3
2026.05
86.2
2026.05
84.8
2026.05
78.8