Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Detection on GoalFrameBench harmful Llama3-8B 2025 (seed prompts)

0.96Accuracy

FrameShield-Last

0.1280.3440.560.776Feb 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
0.960.86
2026.02
0.770.86
2026.02
0.60.75
2026.02
0.370.54
2026.02
0.160.26