Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sabotage detection on BigCodeBench Sabotage (reasoning LLM attacker)
Loading...
0.87
log-AUROC
Extract-and-Evaluate
0.4332
0.5466
0.66
0.7734
Jan 28, 2026
log-AUROC
Best Monitor Type
Updated 4d ago
Evaluation Results
Method
Method
Links
log-AUROC
Best Monitor Type
Extract-and-Evaluate
Monitor Model=GPT-4.1
2026.01
0.87
-
Action-only
Monitor Model=GPT-4.1
2026.01
0.73
-
Extract-and-Evaluate
Monitor Model=Claude-3...
2026.01
0.73
-
Action-only
Monitor Model=Claude-3...
2026.01
0.61
-
CoT-only
Monitor Model=GPT-4.1
2026.01
0.58
-
CoT+action
Monitor Model=GPT-4.1
2026.01
0.55
-
CoT+action
Monitor Model=Claude-3...
2026.01
0.48
-
CoT-only
Monitor Model=Claude-3...
2026.01
0.45
-
Feedback
Search any
task
Search any
task