Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jigsaw

Benchmarks

Task NameDataset NameSOTA ResultTrend
DetoxificationJigsaw (test)
Perplexity (PPL)20.8
29
Visual ReasoningJigsaw
Accuracy88.6
25
Spatial ConfigurationJigsaw
Metric 299
12
Binary classificationjigsaw
ROC AUC0.97
11
Fairness EvaluationJigsaw
BiasAUC75.6
9
Binary ClassificationToxic Jigsaw
Competition Score0.987
7
Toxicity DetectionJigsaw Perspective-based Negated Private (test)
Accuracy87
7
Fairness-aware ClassificationJigsaw
Training Time (min)30
7
Toxicity classificationJigsaw (test)
Accuracy96
6
Visual puzzle solvingJigsaw R1 (test)
Accuracy (2x1)61.9
6
Part-based Image GenerationJigsaw
FID160.1
5
Alignment AuditJigsaw Toxic Comment
Average Treatment Effect (ATE)0
5
Toxicity ClassificationJigsaw-ML
AUC98.4
2
Toxicity ClassificationJigsaw-BL
AUC97.1
2
Multi-label Toxic Content ClassificationJigsaw-ML
Attack Success Rate71.7
2
Binary Toxic Content ClassificationJigsaw-BL
Attack Success Rate99.27
2
Showing 16 of 16 rows