Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Toxicity Classification on Toxicity
Loading...
90.4
Original Accuracy
AdvDemo + CW
78.232
81.391
84.55
87.709
Jan 29, 2026
Feb 13, 2026
Mar 1, 2026
Mar 17, 2026
Apr 1, 2026
Apr 17, 2026
May 3, 2026
Original Accuracy
AD
ASRR (Fake)
ASRR (Temp)
ASRR (Needle)
Updated 28d ago
Evaluation Results
Method
Method
Links
Original Accuracy
AD
ASRR (Fake)
ASRR (Temp)
ASRR (Needle)
AdvDemo + CW
Recipe Code=p10_CWmess...
2026.01
90.4
3.8
61.6
80
63.6
AdvDemo + Random Template
Recipe Code=p10_length10
2026.01
90.4
7
9.8
0
94.8
AdvDemo + CW + Random Template
Variant=Toxicity I, Re...
2026.01
90.4
11.6
57.2
9,360
94.8
AdvDemo + CW + Random Template
Variant=Toxicity II, R...
2026.01
90.4
4.2
53.4
0
94.8
Llama-3.3-70B-Instruct-FP8
N (#rows)=1,000, K (#c...
2026.05
80.1
-
-
-
-
gpt-oss-120B
N (#rows)=1,000, K (#c...
2026.05
78.7
-
-
-
-
Feedback
Search any
task
Search any
task