Share your thoughts, 1 month free Claude Pro on usSee more

Sabotage detection on BigCodeBench-Sabotage traditional LLM attacker

0.84log-AUROC

Extract-and-Evaluate

Updated 4mo ago

Evaluation Results

Method	Links
Extract-and-Evaluate 2026.01		0.84	-
CoT+action 2026.01		0.79	-
CoT-only 2026.01		0.76	-
Extract-and-Evaluate 2026.01		0.75	-
CoT-only 2026.01		0.68	-
Action-only 2026.01		0.62	-
Action-only 2026.01		0.61	-
CoT+action 2026.01		0.61	-