Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Input Moderation Benchmark Suite (ToxicChat, OAIMod, Aegis, Aegis2, SSTest, HarmB, WildG)

88.2Macro-average F1

Llama3-StreamGuard-8B

69.27274.18679.184.014Apr 5, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.04
88.2--------
2026.04
87.8--------
2026.04
87--------
2026.04
86.7--------
2026.04
86.4--------
2026.04
86.3--------
2026.04
86.2--------
2026.04
86.2--------
2026.04
86.2--------
2026.04
85.9--------
2026.04
85.8--------
2026.04
85.6--------
2026.04
85.5--------
2026.04
84.7--------
2026.04
84.7--------
2026.04
84.6--------
2026.04
84.4--------
2026.04
82.9--------
2026.04
79.4--------
2026.04
75.9--------
2026.04
70.4--------
2026.04
70--------
2026.04
-71.574.190.386.310098.788.187
2026.04
-7268.385.284.99897.287.184.7
2026.04
-75.57677.781.796.996.88684.4
2026.04
-737085.986.699.510088.686.2
2026.04
-81.781.275.580.298.598.985.385.9
2026.04
-75.37485.786.19999.487.586.7
2026.04
-80.180.375.580.898.598.784.485.5
2026.04
-74.971.58987.798.694.888.686.4
2026.04
-77.774.487.187.899.399.28987.8
2026.04
-77.47588.587.999.599.789.588.2