Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sabotage detection on BigCodeBench-Sabotage traditional LLM attacker

0.84log-AUROC

Extract-and-Evaluate

0.60080.66290.7250.7871Jan 28, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
0.84-
2026.01
0.79-
2026.01
0.76-
2026.01
0.75-
2026.01
0.68-
2026.01
0.62-
2026.01
0.61-
2026.01
0.61-