Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Coding Performance on Insecure-code 1000-prompt held-out
Loading...
93.9
Task Success Rate
Naive (corruption baseline)
78.196
82.273
86.35
90.427
May 26, 2026
Task Success Rate
Updated 6d ago
Evaluation Results
Method
Method
Links
Task Success Rate
Naive (corruption baseline)
Probe=—, Intervention=—
2026.05
93.9
v-ref (2 epochs)
Probe=labelled, Interv...
2026.05
93.6
GRASP (3 epochs)
Probe=unsupervised, In...
2026.05
92.3
CAFT-SAE
Probe=labelled, Interv...
2026.05
87.3
GRASP (2 epochs)
Probe=unsupervised, In...
2026.05
86.2
CAFT-PCA
Probe=labelled, Interv...
2026.05
84.8
v-ref (1 epoch)
Probe=labelled, Interv...
2026.05
78.8
Feedback
Search any
task
Search any
task