Overall Average

Benchmarks

Task Name	Dataset Name	SOTA Result
Image Retrieval	Overall Average	mAP60.6	21
Creative Writing	Overall Average Poem, Joke, Story	Semantic Diversity0.3603	20
Depth Completion	Overall Average (ScanNet, IBims-1, VOID, NYUv2, KITTI, DDAD)	Rank1.75	17
Question Answering	Overall Average	Accuracy55	14
Open-Vocabulary Semantic Segmentation	Overall Average 9 datasets	Average IoU46.9	10
Mathematical Reasoning	Overall Average AIME MATH500 AMC23	Majority Score (Avg)75.2	8
Aggregated Reasoning Evaluation	Overall Average	Average Score @ 1645.6	8
Synthetic Face Detection	Overall Average	Accuracy96.4	7
Image-Goal Navigation	Overall Average	SR50.2	6
Automatic Subtitling	Overall Average across MSTCIN, ECSC, and EPI (test)	Subtitling Error Rate (AVG)59.2	6
Point-Goal Navigation	Overall Average	SR80.4	5
Meta-evaluation	Overall Average	Accuracy Improvement9.7	5
Mathematical Reasoning	Overall Average	Avg Rank1.52	5
Aggregate Performance	Overall Average	Speedup4.12	4
Cross-domain Heuristic Design	Overall Average CVRP-C, TSP-ACO, OP-ACO, OVRP-C	Gap (%)32.85	4
fMRI-to-image reconstruction	Overall Average NSD, HCP, BOLD5000, NOD	PixCorr0.104	4

Showing 16 of 16 rows