Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Kaggle

Benchmarks

Task NameDataset NameSOTA ResultTrend
Adversarial AttackKaggle
Average Cumulative Reward0.96
32
Personality DetectionKaggle
I/E Score90.12
13
16-types multiclass personality classificationKaggle
F1 Score (%)41.34
10
4-dimensional binary personality classificationKaggle
Macro F180.57
10
Weather Type ClassificationKaggle (test)
F1 Score98
6
Sudoku solvingKaggle Unfiltered (generalization)
Accuracy99.9
6
Model and Hyperparameter SelectionKaggle private (test)
p-rank80.56
6
Omission DetectionKaggle Dataset
Accuracy65.2
4
Hallucination DetectionKaggle Dataset
Accuracy85.4
4
Showing 9 of 9 rows