Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Influence EstimationBenchmarks Budgets k=1, 5, 10, 25 (Aggregated)
AUC (SR, dB)42.73
66
General Multimodal UnderstandingCombined 9 Benchmarks
Average Accuracy100
13
Zero-shot language understandingZero-shot Benchmarks
Average Zero-shot Accuracy51.47
9
General Language Understanding10 Benchmarks Average (test)
Accuracy (Average)63.7
6
Showing 4 of 4 rows