Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Overall Average

Benchmarks

Task NameDataset NameSOTA ResultTrend
Image RetrievalOverall Average
mAP60.6
21
Creative WritingOverall Average Poem, Joke, Story
Semantic Diversity0.3603
20
Depth CompletionOverall Average (ScanNet, IBims-1, VOID, NYUv2, KITTI, DDAD)
Rank1.75
17
Question AnsweringOverall Average
Accuracy55
14
Open-Vocabulary Semantic SegmentationOverall Average 9 datasets
Average IoU46.9
10
Aggregated Reasoning EvaluationOverall Average
Average Score @ 1645.6
8
Synthetic Face DetectionOverall Average
Accuracy96.4
7
Automatic SubtitlingOverall Average across MSTCIN, ECSC, and EPI (test)
Subtitling Error Rate (AVG)59.2
6
Meta-evaluationOverall Average
Accuracy Improvement9.7
5
Mathematical ReasoningOverall Average
Avg Rank1.52
5
Cross-domain Heuristic DesignOverall Average CVRP-C, TSP-ACO, OP-ACO, OVRP-C
Gap (%)32.85
4
fMRI-to-image reconstructionOverall Average NSD, HCP, BOLD5000, NOD
PixCorr0.104
4
Showing 12 of 12 rows