Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful input detection on JailbreakBench Conv

100MCA Accuracy

ReGA

50.0863.047688.96Jun 2, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
10092
2025.06
9898
2025.06
9793
2025.06
9585
2025.06
9494
2025.06
8984
2025.06
5243