Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Toxicity Classification on VOICED item-level (binary)
Loading...
80
Accuracy
DiADEM
77.8368
78.3984
78.96
79.5216
Apr 9, 2026
Accuracy
F1 (Macro)
F1 (Weighted)
Cohen's Kappa (κ)
MCC
Jensen-Shannon Divergence (JSD)
Mean Deviation (MD)
Error Rate (ER)
Expected Calibration Error (ECE)
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 (Macro)
F1 (Weighted)
Cohen's Kappa (κ)
MCC
Jensen-Shannon Divergence (JSD)
Mean Deviation (MD)
Error Rate (ER)
Expected Calibration Error (ECE)
DiADEM
2026.04
80
55.74
74.38
16.9
24.11
2.5
20.7
19.11
1.44
DisCo
2026.04
78.35
57.31
74.35
17.51
20.32
3.38
59.1
20.89
2.6
LeWiDi
2026.04
77.92
45.91
69.57
1.52
3.39
4.29
62.71
21.03
4.79
Feedback
Search any
task
Search any
task