Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Toxicity Detection on RealTox Challenging
Loading...
98.92
Accuracy
Trajectory (Raw)
87.2096
90.2498
93.29
96.3302
Mar 1, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Trajectory (Raw)
Backbone=Qwen3-30B MoE
2026.03
98.92
Trajectory (Raw)
Backbone=Qwen2.5-14B
2026.03
98.83
Trajectory (Raw)
Backbone=Qwen3-32B
2026.03
98.83
TaT (Disp.)
Backbone=Qwen2.5-14B
2026.03
98.58
Trajectory (Raw)
Backbone=Llama3.1-8b
2026.03
97.91
TaT (Disp.)
Backbone=Llama3.1-8b
2026.03
96
Linear Probe
Backbone=Llama3.1-8b
2026.03
95.83
Linear Probe
Backbone=Qwen2.5-14B
2026.03
95.16
TaT (Disp.)
Backbone=Qwen3-32B
2026.03
95.08
Linear Probe
Backbone=Qwen3-30B MoE
2026.03
94.5
Linear Probe
Backbone=Qwen3-32B
2026.03
91.16
TaT (Disp.)
Backbone=Qwen3-30B MoE
2026.03
87.66
Feedback
Search any
task
Search any
task