Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NaturalBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal ReasoningNaturalBench
Accuracy82.5
24
General Multimodal ReasoningNaturalBench
General Score78.62
21
Robustness to Natural Adversarial ExamplesNaturalBench
Accuracy9.89
20
Vision-Language UnderstandingNaturalbench
General Score13.2
13
Multimodal UnderstandingNaturalBench
NaturalBench Score77.2
12
Image-Text RetrievalNaturalBench Retrieval
T Score71.9
11
Compositional ReasoningNaturalBench
Accuracy35.5
10
Vision-Language ReasoningNaturalBench (test)
Simple Accuracy66.02
7
Robustness EvaluationNaturalBench
GACC33.5
6
Paired-prompt evaluationNaturalBench
Simple Accuracy67.81
2
Visual Question AnsweringNaturalBench
Simple Accuracy0.6946
2
Showing 11 of 11 rows