Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evaluation dataset

Benchmarks

Task NameDataset NameSOTA ResultTrend
Compositional GeneralizationEvaluation Dataset (Unseen Average)
Score42.86
18
Compositional GeneralizationEvaluation Dataset Seen Average
Score62.34
18
Compositional GeneralizationEvaluation Dataset Unseen (Fold 3)
Score0.4022
18
Compositional GeneralizationEvaluation Dataset (Fold 3 Seen)
Score66.69
18
Compositional GeneralizationEvaluation Dataset Unseen (Fold 2)
Score50
18
Compositional GeneralizationEvaluation Dataset (Fold 2 Seen)
Score63.63
18
Compositional GeneralizationEvaluation Dataset Unseen (Fold 1)
Score0.4818
18
Compositional GeneralizationEvaluation Dataset (Fold 1 Seen)
Score0.6191
18
Compositional GeneralizationEvaluation Dataset (Full)
Score0.6379
18
Malicious Package DetectionEvaluation Dataset
Accuracy99.5
11
Correlation analysis with ground truthEvaluation Dataset 2000 samples
Pearson Correlation Coefficient0.754
7
Global 3D EditingEvaluation dataset unseen 3D assets (test)
CLIP Similarity0.272
6
Local 3D EditingEvaluation dataset unseen 3D assets (test)
CLIP Similarity0.292
6
Image-to-3D GenerationEvaluation Dataset
FID34.251
2
Showing 14 of 14 rows