Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Toxicity Classification on Jigsaw dataset
Loading...
44.2
Rescue Rate
Mistral-small-24B
27.144
31.572
36
40.428
May 30, 2026
Rescue Rate
95% CI Lower Bound
Sample Size (Zero-Shot)
Updated 1d ago
Evaluation Results
Method
Method
Links
Rescue Rate
95% CI Lower Bound
Sample Size (Zero-Shot)
Mistral-small-24B
Condition=Aggregated a...
2026.05
44.2
43.1
8,536
DeepSeek-V3
Condition=Aggregated a...
2026.05
38.1
36.9
7,264
Llama-3.1-8B
Condition=Aggregated a...
2026.05
37.5
36.5
9,040
GPT-4o-mini
Condition=Aggregated a...
2026.05
35.7
34.6
7,152
Mixtral-8x7B
Condition=Aggregated a...
2026.05
34.6
33.5
7,416
Mistral-7B
Condition=Aggregated a...
2026.05
31.6
30.5
7,568
Llama-3.1-70B
Condition=Aggregated a...
2026.05
31.3
30.3
7,848
Llama-3.3-70B
Condition=Aggregated a...
2026.05
30.4
29.4
7,592
Qwen-2.5-72b
Condition=Aggregated a...
2026.05
27.8
26.8
6,504
Feedback
Search any
task
Search any
task