Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NaturalBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal ReasoningNaturalBench
Accuracy82.5
24
Robustness to Natural Adversarial ExamplesNaturalBench
Accuracy9.89
20
Vision-Language ReasoningNaturalBench (test)
Simple Accuracy66.02
7
Paired-prompt evaluationNaturalBench
Simple Accuracy67.81
2
Visual Question AnsweringNaturalBench
Simple Accuracy0.6946
2
Showing 5 of 5 rows