Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NaturalBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal ReasoningNaturalBench
Accuracy82.5
24
Robustness to Natural Adversarial ExamplesNaturalBench
Accuracy9.89
20
Image-Text RetrievalNaturalBench Retrieval
T Score71.9
11
Compositional ReasoningNaturalBench
Accuracy35.5
10
Vision-Language ReasoningNaturalBench (test)
Simple Accuracy66.02
7
Paired-prompt evaluationNaturalBench
Simple Accuracy67.81
2
Visual Question AnsweringNaturalBench
Simple Accuracy0.6946
2
Showing 7 of 7 rows