Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety Evaluation on AdvBench (Adversarial Attack Metrics)

100Overall Safety Score

Post-hoc (LlamaGuard)

93.999295.557197.11598.6729Sep 15, 2025
Updated 27d ago

Evaluation Results

MethodLinks
2025.09
10037.546.1557.598.6585.7760.9692.3172.36--
2025.09
10053.2728.6594.8110069.046597.8876.08--
2025.09
10032.8827.6968.6510061.1559.2399.6268.65--
2025.09
10087.1264.2397.3110081.7388.6599.6289.83--
2025.09
10090.3885.1999.6210092.1285.1999.4293.99--
2025.09
1007575.9684.6210086.5470.7799.6286.56--
2025.09
10097.594.2399.4210098.8595.1999.6298.1--
2025.09
10016.356.9231.5443.655.5814.2317.8829.52--
2025.09
10029.8151.3549.2398.8587.3163.0892.1271.47--
2025.09
10054.6228.6594.6284.4234.0462.8894.8169.25--
2025.09
10021.1520.3864.2398.8534.8142.8897.8860.02--
2025.09
10085.9664.2396.7398.0874.6289.0498.0888.34--
2025.09
10090.9687.6999.6296.1565.5881.7399.0490.1--
2025.09
10096.3594.0498.8510098.2794.2399.4297.65--
2025.09
99.9274.2463.8395.9299.9284.4174.9599.0286.53--
2025.09
99.975.6162.4395.9195.765.3670.6598.1382.96--
2025.09
99.8168.0863.6581.3599.0465.9659.2398.6579.47--
2025.09
99.6888.5773.3897.4699.692.5890.0697.9992.42--
2025.09
99.5847.9648.1970.8399.1965.2253.9297.2772.77--
2025.09
99.5756.0656.9373.2999.8386.2165.8998.9779.59--
2025.09
99.3989.8574.197.4499.6594.9389.7299.193.02--
2025.09
99.2363.8540.3870.5848.6531.7323.6567.3155.67--
2025.09
99.2375.7776.5482.6996.7397.1268.8598.6586.95--
2025.09
98.4676.5475.3879.8197.596.7371.5497.6986.71--
2025.09
98.1652.8363.7561.2397.5492.7269.3487.9377.94--
2025.09
95.8536.0830.0633.5257.4734.4528.5836.7944.1--
2025.09
95.8546.7864.9948.697.0894.8667.0390.275.67--
2025.09
95.3821.3520.3846.9242.6946.9239.6227.3142.57--
2025.09
94.7139.5742.5252.3449.5563.5951.9438.0754.04--
2025.09
94.2363.6554.4270.9644.6267.1250.9672.1264.76--
2025.08
---------4032
2025.08
---------4032
2025.08
---------010
2025.08
---------26
2025.08
---------22
2025.08
---------00
2025.08
---------00
2025.08
---------2622
2025.08
---------2622
2025.08
---------22
2025.08
---------00
2025.08
---------00
2025.08
---------00
2025.08
---------00