Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VisualPuzzles

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual ReasoningVisualPuzzles OOD (test)
Overall Accuracy47.95
8
Multimodal Visual Logic ReasoningVisualPuzzles
Mean@558.2
8
Visual logicVisualPuzzles
Top-1 Accuracy43.15
7
Visual ReasoningVisualPuzzles
Algorithmic Score37.4
3
Showing 4 of 4 rows