Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Reasoning on MMBench v1.1 (test)

67.4Accuracy

LLaVA

52.8456.6260.464.18Aug 1, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
67.4-
2025.08
67.2-
2025.08
67-
2025.08
66.8-
2025.08
66.3-
2025.08
66.2-
2025.08
66-
2025.08
65.9-
2025.08
65.3-
2025.08
65.2-
2025.08
65-
2025.08
64.2-
2025.08
64.1-
2025.08
63.9-
2025.08
63.8-
2025.08
63.5-
2025.08
63.1-
2025.08
63.1-
2025.08
63.1-
2025.08
62.5-
2025.08
61.6-
2025.08
61.5-
2025.08
60.1-
2025.08
59.8-
2025.08
59.2-
2025.08
55.5-
2025.08
53.4-
2026.02
-0.837
2026.02
-0.809
2026.02
-0.821
2026.02
-0.789
2026.02
-0.809
2026.02
-0.733
2026.02
-0.776
2026.04
-82.2
2026.04
-78.4
2026.04
-78.6
2026.04
-75.6
2026.04
-76.2
2026.04
-72.1
2026.04
-72.7