Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harmful Content Detection on PHEME Known Attacks: DeepWordBug, TFAdjusted, TREPAT (test)

85.59Accuracy

LLM-SGA/ARHOCD

80.587681.886383.18584.4837Dec 19, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
85.5984.4285.5484.8510.59
2025.12
85.4984.3285.4584.7610.59
2025.12
83.6682.7382.0982.3812.55
2025.12
83.5482.7481.7182.1513.96
2025.12
81.8181.3279.1379.9315.35
2025.12
81.5980.8279.2379.8515.03
2025.12
81.3880.4279.2779.7414.99
2025.12
81.1580.678.3679.1615.26
2025.12
80.9579.6279.8279.7112.79
2025.12
80.7880.178.0878.8115.03