Benchmarks

Task Name	Dataset Name	SOTA Result
Influence Estimation	Benchmarks Budgets k=1, 5, 10, 25 (Aggregated)	AUC (SR, dB)42.73	66
Zero-shot Evaluation	Zero-shot Benchmarks Average	Average Accuracy68.6	60
Zero-shot Evaluation	Benchmarks 0-shot	0-shot Average Score73.33	33
Zero-shot language understanding	Zero-shot Benchmarks	Average Zero-shot Accuracy73.07	21
General Language Understanding	10 Benchmarks Average (test)	Accuracy (Average)67.4	15
Visual Question Answering	14 benchmarks 8 spatial and 6 real-world (test)	Average Accuracy (14 Benchmarks)74.5	14
General Multimodal Understanding	Combined 9 Benchmarks	Average Accuracy100	13
Image Classification	six benchmarks (macro-average)	Accuracy78.8	6
General Downstream Evaluation	19 short-context benchmarks	ShortAvg41.32	6
Agentic Trajectory Question Answering	Benchmarks 12 task columns	Average Rank3.1	5

Showing 10 of 10 rows