Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adversarial Robustness on Harmbench 39 standard behavior examples

0Attack Success Rate

ZEPHYR-CAT

-4235077May 24, 2024
Updated 3mo ago

Evaluation Results

MethodLinks
2024.05
0
2024.05
0
2024.05
0
2024.05
100